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ORIGINAL 

CLONING, EXPRESSION AND CHARACTERIZATION OF THE SPG4 GENE 
RESPONSIBLE FOR THE MOST COMMON FORM OF AUTOSOMAL DOMINANT 
SPASTIC PARAPLEGIA. 

5 The invention relates to the identification and characterization of the SPG4 gene 

encoding spastin, which is responsible for the most common form of autosomal 
dominant hereditary spastic paraplegia (HSP), to the cloning and characterization of its 
cDNA, and also to the corresponding polypeptides. The invention also relates to 
vectors, to transformed cells and to transgenic animals, and also to diagnostic methods 

10 and to methods for selecting a chemical or biochemical compound capable of 
interacting directly or indirectly with a polypeptide according to the invention. 

Hereditary spastic paraplegias (HSPs) are degenerative disorders of the central 
nervous system, characterized by bilateral and progressive spasticity of the lower 
limbs. They reveal themselves clinically through difficulties in walking possibly evolving 

15 into total paralysis of both legs. The physiopathology of this set of diseases is, to date, 
relatively undocumented; however, anatomopathological data make it possible to 
conclude that the attack is limited to the pyramidal tracts responsible for voluntary 
motricity in the spinal cord (1). Various clinical and genetic forms of HSP exist. The so- 
called "pure" HSPs, which correspond to isolated spasticity of the lower limbs, are 

20 clinically distinguished from the "complex" HSPs, for which the spasticity of the legs is 
associated with other clinical signs of neurological or non-neurological type (2). From a 
genetic point of view, the HSPs can be transmitted according to the autosomal 
dominant (AD-HSP), autosomal recessive (AR-HSP) or X-linked (X-HSP) mode. The 
"pure" form of HSP, which is most commonly transmitted according to the autosomal 

25 dominant mode, remains the most frequent (approximately 80% of HSPs) (1). The 
incidence of HSPs, which remains difficult to estimate because of rare epidemiological 
studies and the considerable clinical variability, varies from 0.9 : 100 000 in Denmark, 3 
to 9.6 : 100 000 in certain regions of Spain (4) or 14 : 100 000 in Norway (5) 
(approximately 3 : 100 000 in France). 

30 In addition to this great clinical variability, which is observed not only between 

various families but also between various affected members of the same family, the 
HSPs are also characterized by considerable genetic heterogeneity. In the case of 
AD-HSPs, four loci have been identified, to date, on chromosomes 14 (locus SPG3) 
(6), 2 (locus SPG4) (7, 8), 15 (locus SPG6) (9) and 8 (locus SPG8) (10). The study of a 

35 large number of families exhibiting an AD-HSP has shown that the gene carried by 
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chromosome 2 is a main locus of this form of the disease, found in 40 to 50% of the 
families analyzed (11, 12). An anticipation phenomenon was observed in some locus 
SPG4-linked HSP families; this phenomenon has, subsequently, been associated with 
the expansion of a (CAG)n repeat demonstrated in 6 Danish families (13) using the 
5 RED (for Rapid Expansion Detection) technique. It has, however, never been possible 
to confirm this expansion in any of the families tested by this method or by the 
systematic search for sequences of (CAG)n type in physical maps composed of YAC 
(for Yeast Artificial Chromosome) or BAC (for Bacterial Artificial Chromosome) clones 
(Hazan et al. f in press Genomics). 

10 To date, three genes responsible for two forms of X-HSP and one form of AR- 

HSP have been identified. Mutations in the gene which encodes a neuron-specific cell 
adhesion molecule, L1-CAM (for L1 Cell Adhesion Molecule), and which is located at 
Xq28 (locus SPG1) cause a complex form of HSP (14) in which the spasticity is 
associated with a mental handicap, whereas mutations in the PLP (for ProteoLipid 

15 Protein) gene located at Xq21 (locus SPG2), which encodes a constitutive molecule of 
the myelin layer, cause pure and complex forms of X-HSP (15). More recently, 
mutations in the gene located at 16q24.3 (locus SPG7), which encodes paraplegin, a 
mitochondrial ATPase of the AAA (for "ATPases Associated with diverse cellular 
Activities") protein family (16), have been associated with complex and pure forms of 

20 AR-HSP (17) suggesting that alterations to oxidative phosphorylation (OXPHOS) may 
be the cause of HSP. 

Thus, there remains, today, a great need to identify and characterize the gene 
responsible for the most common form of AD-HSP. The identification of this gene 
should, in particular, allow, besides the possibility of a test for antenatal screening in 

25 the families concerned, a better understanding of some of the molecular mechanisms 
engendering these degenerations specific for nerve bundles of the spinal cord, or even 
make it possible to provide an elementary response regarding therapeutic treatment for 
the patients. 

This is precisely the subject of the present invention. 

30 After having delimited the localization range between the D2S352 and D2S2347 

genetic markers by studying recombination events in locus SPG4-linked HSP families, 
the inventors have established a contig of BACs covering a physical distance evaluated 
at approximately 1.5 Mb and have undertaken a positional cloning strategy based on 
sequencing the SPG4 range in order to completely identify all the genes located in the 

35 candidate region. The analysis of the sequence of the two BACs, D (b336P14) and 
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G (B763N4), has revealed the presence of a gene which is composed of 17 exons, 
extending over a distance of approximately 100 kb, and which exhibits homology with 
the genes encoding proteins of the AAA family. Comparison of the sequence of this 
gene between the healthy and affected individuals of AD-HSP families has made it 
5 possible to demonstrate various mutations in the patients. 

A subject of the invention is thus the identification and characterization of the 
SPG4 (or SPAST) gene encoding a novel nuclear member of the AAA family, 
responsible for the most common form of AD-HSP. 

In a first aspect, a subject of the present invention is a purified or isolated 
10 nucleic acid of the SPG4 gene, characterized in that it comprises at least 15 
consecutive nucleotides, preferably 20, 25, 30, 50, 100 or 200 consecutive nucleotides, 
of a sequence chosen from the group comprising: 

- the sequence SEQ ID No. 1, which is a genomic sequence of the human SPG4 gene; 

- the nucleic acid sequences which are homologues or variants of the nucleic acid of 
1 5 sequence SEQ ID No. 1 ; 

- the sequence which is complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

The present invention relates, of course, to both the DNA and RNA sequences, 
and also the sequences which hybridize with them, as well as the corresponding 

20 double-stranded DNAs. 

The terms "nucleic acid", "nucleic acid sequence" or "sequence of nucleic acid", 
"polynucleotide", "oligonucleotide", "polynucleotide sequence", and "nucleotide 
sequence", which will be used equally in the present description, will be intended to 
refer to both a double-stranded DNA, a single-stranded DNA and products of 

25 transcription of said DNAs, and/or an RNA fragment, said isolated natural, or synthetic 
fragments which may or may not include unnatural nucleotides, referring to a precise 
series of nucleotides, which may or may not be modified, making it possible to define a 
fragment or a region of a nucleic acid. The expression "natural isolated, or synthetic 
DNA and/or RNA fragment, which may or may not include unnatural nucleotides" is 

30 intended to mean a precise series of nucleotides, which may or may not be modified, 
making it possible to define a fragment, a segment or a region of a nucleic acid. 

It should be understood that the present invention does not relate to the 
genomic nucleotide sequences in their natural chromosomal environment, i.e. in the 
natural state. It involves sequences which have been isolated and/or purified, i.e. they 
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have been removed directly or, indirectly, for example by copying, their environment 
having been at least partially modified. 

The term "homologous nucleic acid sequence" is intended to refer to the 
sequences which have, with respect to the reference nucleic acid sequence, certain 
5 modifications, such as in particular a deletion, a truncation, an extension, a chimeric 
fusion and/or a mutation, in particular a point mutation, and the nucleic acid sequence 
of which shows at least 80%, preferably 90% or 95%, identity after alignment, with the 
re f erence nucleic acid sequence. It preferably involves sequences for which the 
complementary sequences are capable of hybridizing specifically with one of the 

10 sequences of the invention. Preferably, the specific or high stringency hybridization 
conditions will be such that they ensure at least 80%, preferably 90% or 95%, identity 
after alignment between one of the two sequences and the sequence which is 
complementary to the other. 

Hybridization under high stringency conditions means that the temperature and 

15 ionic strength conditions are chosen such that they allow the hybridization between two 
complementary DNA fragments to be maintained. By way of illustration, high stringency 
conditions of the hybridization step for the purposes of defining the polynucleotide 
fragments described above are advantageously as follows. 

The DNA-DNA or DNA-RNA hybridization is carried out in two steps: 

20 (1) prehybridization at 42°C for 3 hours in phosphate buffer (20 mM, pH 7.5) containing 
5 x SSC (1 x SSC corresponds to a 0.15 M NaCI + 0.015 M sodium citrate solution), 
50% of formamide, 7% of sodium dodecyl sulphate (SDS), 10 x Denhardt's, 5% of 
dextran sulphate and 1% of salmon sperm DNA; (2) actual hybridization for 20 hours at 
a temperature dependent on the size of the probe (i.e. 42°C for a probe of size > 100 

25 nucleotides), followed by two 20-minute washes at 20°C in 2 x SSC + 2% SDS and one 
20-minute wash at 20°C in 0.1 x SSC + 0.1% SDS. The final wash is carried out in 
0.1 x SSC + 0.1% SDS for 30 minutes at 60°C for a probe of size > 100 nucleotides. 
The high stringency hybridization conditions described above for a polynucleotide of 
defined size will be adjusted by those skilled in the art for oligonucleotides of greater or 

30 smaller size, according to the teaching of Sambrook et al., 1989. 

The term "nucleic acid sequence which is a variant" or "nucleic acid which is a 
variant" of a reference nucleic acid sequence will be intended to refer to the set of 
nucleic acid sequences corresponding to allelic variants, i.e. individual variations of the 
reference nucleic acid sequence. These natural mutated sequences correspond to 
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polymorphisms present in mammals, in particular in human beings, and in particular to 
polymorphisms which can cause a pathology to occur and/or to develop. 

While the sequences according to the invention relate to normal sequences, 
they also relate to sequences which are mutated insofar as they include at least one 
5 point mutation, and preferably at most 10% of mutations, with respect to the normal 
sequence. 

In particular, the variant nucleic acid sequences will comprise any sequence of 
at least 15 consecutive nucleotides, preferably 20, 25, 30, 50, 100 or 200 consecutive 
nucleotides, of a polymorphic sequence of the genomic sequence of the human SPG4 

10 gene of sequence SEQ ID No. 1, and the nucleic acid sequence of which has, with 
respect to the sequence SEQ ID No. 1, at least one mutation corresponding in 
particular to a truncation, deletion, substitution and/or addition of an amino acid 
residue. In the present case, the variant nucleic acid sequences having at least one 
mutation will herein be linked to the pathologies of AD-HSP type linked to SPG4 locus. 

15 Preferably, the present invention relates to the mutated nucleic acid sequences 

in which the mutations produce a modification of the amino acid sequence of the 
polypeptide encoded by the normal sequence. 

The term "variant nucleic acid sequences" will also be intended to refer to any 
RNA or cDNA resulting from a mutation of a splice site of the genomic nucleic acid 

20 sequence SEQ ID No. 1. 

The invention preferably relates to a purified or isolated nucleic acid according 
to the present invention, characterized in that it comprises a sequence chosen from the 
following group: 

- the sequence SEQ ID No. 1; 

25 - the sequence SEQ ID No. 2, which is the cDNA sequence encoding human spastin; 

- the sequence SEQ ID No. 72, sequence of the incomplete cDNA encoding murine 
spastin represented in Figure 5, "mouse" line; 

- the nucleic acid sequences which are homologues or variants of the sequences SEQ 
ID No. 1 , SEQ ID No. 2 or SEQ ID No. 72; 

30 - the sequence complementary thereto; and 

- the sequence of the corresponding RNA thereof. 

Preferably, the invention relates to a purified or isolated nucleic acid according 
to the invention, characterized in that it comprises at least one mutation the position 
and nature of which are identified in Table 5. 
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The primers or probes, characterized in that they comprise a sequence of a 
nucleic acid according to the invention, also form part of the invention. 

The present invention thus relates to the set of primers which can be deduced 
from the nucleotide sequences of the invention and which may make it possible to 
5 demonstrate said nucleotide sequences of the invention, in particular the mutated 
sequences, using in particular an amplification method such as the PCR method, or a 
related method. 

The present invention also relates to the set of probes which can be deduced 
from the nucleotide sequences of the invention, in particular from the sequences 
10 capable of hybridizing with them, and which may make it possible to demonstrate said 
nucleotide sequences, in particular to distinguish the normal sequences from the 
mutated sequences. 

The present invention relates, in particular, to the probes or primers having 
sequences chosen from the sequences SEQ ID No. 4 to SEQ ID No. 71. 
15 The invention also relates to the use of a nucleic acid sequence according to 

the invention as a probe or primer, for detecting, identifying, assaying or amplifying a 
nucleic acid sequence. 

According to the invention, the polynucleotides which can be used as a probe or 
as a primer in processes for detecting, identifying, assaying or amplifying a nucleic acid 
20 sequence will have a minimum size of 15 bases, preferably of 20 bases, or better still 
of 25 to 30 bases. 

The set of probes and primers according to the invention may be labelled 
directly or indirectly with a radioactive or nonradioactive compound, using methods well 
known to those skilled in the art, in order to obtain a detectable and/or quantifiable 
25 signal. 

The nonlabelled polynucleotide sequences according to the invention can be 
used directly as a probe or primer. 

The sequences are generally labelled so as to obtain sequences which can be 
used for many applications. The labelling of the primers or of the probes according to 
30 the invention is carried out with radioactive elements or with nonradioactive molecules. 

Among the radioactive isotopes used, mention may be made of 32 P, ^P, ^S, 3 H 
or 125 l. The nonradioactive entities are selected from ligands, such as biotin, avidin or 
streptavidin, dioxygenin, haptens, colorants and luminescent agents, such as 
radioluminescent, chemiluminescent, bioluminescent, fluorescent or phosphorescent 
35 agents. 



The polynucleotides according to the invention can thus be used as a primer 
and/or probe in processes using, in particular, the PCR (polymerase chain reaction) 
technique (Erlich, 1989; Innis et al., 1990, and Rolfs et aL, 1991). This technique 
requires choosing pairs of oligonucleotide primers framing the fragment which must be 
amplified. Reference may, for example, be made to the technique described in 
American patent US No. 4,683,202. The amplified fragments can be identified, for 
example after agarose or polyacrylamide gel electrophoresis, or after a 
chromatographic technique such as gel filtration or ion exchange chromatography, and 
then sequenced. The specificity of amplification can be controlled using, as a primer, 
the nucleotide sequences of polynucleotides of the invention and, as a matrix, plasmids 
containing these sequences or the derived amplification products. The amplified 
nucleotide fragments can be used as reagents in hybridization reactions in order to 
demonstrate the presence, in a biological sample, of a target nucleic acid having a 
sequence complementary to that of said amplified nucleotide fragments. 

The invention is also directed toward the nucleic acids which can be obtained 
by amplification using primers according to the invention. 

Other techniques for amplifying the target nucleic acid can be advantageously 
employed as an alternative to PCR (PCR-like), using pairs of primers having nucleotide 
sequences according to the invention. The term "PCR-like" will be intended to refer to 
all methods using direct or indirect reproductions of nucleic acid sequences, or in which 
the labelling systems have been amplified. These techniques are, of course, known. In 
general, they involve amplifying the DNA with a polymerase; when the sample of origin 
is an RNA, it is advisable to perform reverse transcription beforehand. There are, 
currently, a great many processes which enable this amplification, such as for example 
the SDA (Strand Displacement Amplification) technique (Walker et al. ( 1992), the TAS 
(Transcription-based Amplification System) technique described by Kwoh et al. in 
1989, the 3SR (Self-Sustained Sequence Replication) technique described by Guatelli 
et al. in 1990, the NASBA (Nucleic Acid Sequence Based Amplification) technique 
described by Kievitis et al. in 1991, the TMA (Transcription Mediated Amplification) 
technique, the LCR (Ligase Chain Reaction) technique described by Landegren et al. 
in 1988 and improved by Barany et al. in 1991, which uses a heat-stable ligase, the 
RCR (Repair Chain Reaction) technique described by Segev in 1992, the CPR (Cycling 
Probe Reaction) technique described by Duck et al. in 1990, and the Q-beta-replicase 
amplification technique described by Miele et al. in 1983 and improved, in particular, by 
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Chu et al. in 1986 and Lizardi et al. in 1988, and then by Burg et al., and also by Stone 
et al., in 1996. 

When the target polynucleotide to be detected is an mRNA, use will 
advantageously be made, prior to carrying out an amplification reaction using the 
5 primers according to the invention or carrying out a detection process using the probes 
of the invention, of an enzyme of reverse transcriptase type in order to obtain a cDNA 
from the mRNA contained in the biological sample. The cDNA obtained will then serve 
as a target for the primers or probes used in the amplification or detection process 
according to the invention. 
10 The probe hybridization technique can be carried out in diverse ways (Matthews 

et al., 1988). The most general method consists in immobilizing the nucleic acid 
extracted from the cells of various tissues or from cells in culture, on a support (such as 
nitrocellulose, nylon or polystyrene), and in incubating the immobilized target nucleic 
acid with the probe, under well defined conditions. After hybridization, the excess probe 
15 is eliminated and the hybrid molecules formed are detected using the appropriate 
method (measurement of the radioactivity, of the fluorescence or of the enzymatic 
activity linked to the probe). 

According to another embodiment of the nucleic acid probes according to the 
invention, the latter can be used as a capture probe. In this case, a probe, termed 
20 "capture probe", is immobilized on a support and is used to capture, by specific 
hybridization, the target nucleic acid obtained from the biological sample to be tested, 
and the target nucleic acid is then detected using a second probe, termed "detection 
probe", labelled with an easily detectable element. 

The splice acceptor or donor site sequences identified in Table 3 also form part 
25 of the present invention. 

In another aspect, the invention comprises a method for screening cDNA or 
genomic DNA libraries, or for cloning isolated genomic or cDNA encoding spastin, 
characterized in that it uses a nucleic acid sequence according to the invention. 
Among these methods, mention may be made in particular of : 
30 - the screening of cDNA libraries and the cloning of the isolated cDNAs (Sambrook et 
al., 1989; Suggs et al., 1981; Woo et al., 1979), using the nucleic acid sequences 
according to the invention; 
-the screening of genomic libraries, for example of BACs (Chumakov et al., 1992; 
Chumakov et al., 1995), and, optionally, a genetic analysis by FISH (Cherif et al., 
35 1990), using sequences according to the invention, enabling the isolation and 
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chromosomal localization, and then the complete sequencing, of the SPG4 gene 
encoding spastin. 

In particular, these methods according to the invention may be used for 
identifying and thus obtaining the genomic sequence or the cDNA of the SPG4 gene in 
5 other mammals, in particular mice. 

These screening and/or cloning methods will comprise, in particular, a step of 
hybridization of a nucleic acid according to the invention with a nucleic acid contained 
in a genomic or cDNA library. 

The invention also comprises a method for identifying the nucleic acid 
10 sequences which promote and/or regulate the expression of the SPG4 gene of 
sequence SEQ ID No. 1, characterized in that it uses a nucleic acid according to the 
invention. 

The computer tools available to those skilled in the art enable them to easily 
identify, using the genomic nucleic acid sequences according to the invention, the 

15 promoter regulatory boxes required and sufficient for controlling gene expression, in 
particular the TATA, CCAAT and GC boxes, and also the stimulatory regulatory 
sequences ("enhancers"), or inhibitory regulatory sequences ("silencers"), which 
control, in CIS, the expression of the genes according to the invention; among these 
regulatory sequences, mention should be made of IRE, MRE and CRE. 

20 The invention also relates to the methods for identifying mutations carried by 

the human SPG4 gene, in particular mutations responsible for autosomal dominant 
hereditary spastic paraplegia, characterized in that they use a nucleic acid sequence 
according to the invention. 

These methods for identifying these mutations will, in particular, comprise the 

25 following steps: (i) isolation of the DNA from the biological sample to be analyzed, or 
production of a cDNA from the mRNA of the biological sample; (ii) specific amplification 
of the target DNA likely to have a mutation, using primers according to the invention; 
(iii) analysis of the amplification products, in particular the size and/or the sequence of 
the amplification products, with respect to a reference sequence. 

30 The expression "methods for identifying a mutation according to the invention" 

is also intended to refer to a method which makes it possible to obtain the nucleic acid 
on which said mutation has been identified. 

The promoter and/or regulatory sequences of the SPG4 gene according to the 

i 

invention having mutations which may modify the expression of the corresponding 
35 protein also form part of the invention. 
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The nucleic acids characterized in that they can be obtained using one of the 
preceding methods according to the invention, or the nucleic acids capable of 
hybridizing, under high stringency conditions (homology of at least 80% between one of 
the two sequences and the sequence complementary to the other), with said nucleic 
5 acids, form part of the invention, especially the variant or homologous nucleic acids, in 
particular the nucleic acid sequences of allelic variants of the SPG4 gene of sequence 
SEQ ID No. 1 or of its cDNA of sequence SEQ ID No. 2, and also the genomic 
sequences of the homologous genes of other mammals such as mice. 

In the present description, the term "Spg4" will be intended to refer to the 
10 mouse gene homologous to the human SPG4 gene. 

The use of a nucleic acid sequence according to the invention as a probe or 
primer for screening a genomic library or a cDNA of course forms part of the subject of 
the present invention. 

In another aspect, the invention comprises a purified or isolated polypeptide 
15 encoded by a nucleic acid according to the invention. 

In the present description, the term "polypeptide" will be used to refer equally to 
a protein or a peptide. 

Preferably, the present invention relates to a polypeptide, characterized in that it 
comprises an amino acid sequence chosen from the following group: 
20 - the sequence SEQ ID No. 3, corresponding to human spastin encoded by the 
sequence SEQ ID No. 2 of the cDNA of the human SPG4 gene; 

- the sequence SEQ ID No. 73, corresponding to a fragment of murine spastin encoded 
by the sequence SEQ ID No. 72 of the incomplete cDNA of the mouse Spg4 gene, 
the sequence SEQ ID No. 73 is represented in Figure 4A, "SPAST_MOUSE" line; 

25 - the sequences of polypeptides which are homologues and variants of the polypeptide 
of sequence SEQ ID No. 3 or SEQ ID No. 73; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
amino acids. 

Also preferably, a subject of the invention is a polypeptide, characterized in that 
30 it comprises an amino acid sequence chosen from the following group: 

- the sequence SEQ ID No, 3 and the sequence SEQ ID No. 73, which sequences 
carrying at least one of the mutations the nature and location of which are identified in 
Table 5 hereinafter; and 

- the sequences of the fragments thereof of at least 8, 10, 15, 30 or 50 consecutive 
35 amino acids. 
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It should be understood that the invention does not relate to polypeptides in 
natural form, i.e. they are not taken in their environment. Specifically, the invention 
relates to the peptides which are obtained by purification from natural sources, or 
obtained by genetic recombination or by chemical synthesis, and which can therefore 
5 include unnatural amino acids. The production of a recombinant polypeptide, which can 
be carried out using one of the nucleotide sequences according to the invention, is 
particularly advantageous since it makes it possible to obtain an increased degree of 
purity of the desired polypeptide. 

The term "homologous polypeptide" will be intended to refer to the polypeptides 
10 which have certain modifications with respect to the reference polypeptide, such as in 
particular one or more deletions or truncations, an extension, a chimeric fusion and/or 
one or more substitutions, and the amino acid sequence of which shows at least 80%, 
preferably 90% or 95%, identity after alignment, with the reference amino acid 
sequence. 

15 The term "variant polypeptide" (or protein variant) will be intended to refer to the 

set of polypeptides encoded by the variant nucleic acid sequences as defined above. 

In particular, the variant polypeptides will comprise any polypeptide which is 
encoded by the mutated genomic sequence of the SPG4 gene of sequence SEQ ID 
No. 1, and the amino acid sequence of which has at least one mutation corresponding 

20 in particular to a truncation, deletion, substitution and/or addition of amino acid residues 
with respect to the sequence SEQ ID No. 3. In the present case, the variant 
polypeptides having at least one mutation will be linked to the pathologies of AD-HSP 
type. 

The term "variant polypeptide" will also be intended to refer to any polypeptide 
25 resulting from mutation of a splice site in the genomic nucleic acid sequence SEQ ID 
No. 1. 

The invention also comprises the cloning and/or expression vectors containing 
a nucleic acid sequence according to the invention. 

The vectors according to the invention, characterized in that they include the 
30 elements which allow the expression and/or the secretion of said sequences in a host 
cell, or a cellular addressing sequence, also form part of the invention. 

The vectors characterized in that they include a promoter and/or regulator 
sequence according to the invention also form part of the invention. 

Said vectors will preferably include a promoter, translation initiation and 
35 termination signals, and also suitable regions for regulating the transcription. They 
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should be able to be maintained stably in the cell and can, optionally, have particular 
signals which specify secretion of the translated protein. 

These various control signals are chosen as a function of the host cell used. To 
this effect, the nucleic acid sequences according to the invention can be inserted into 
5 vectors which replicate autonomously in the host chosen, or vectors which integrate in 
the host chosen. 

Among the systems which replicate autonomously, use will preferably be made, 
as a function of the host cell, of the systems of plasmid or viral type, the viral vectors 
possibly in particular being adenoviruses (Pemcaudet et al., 1992), retroviruses, 

10 Antiviruses, poxviruses or herpesviruses (Epstein et al., 1992). Those skilled in the art 
know the technology which can be used for each of these systems. 

When integration of the sequence into the chromosomes of the host cell is 
desired, use may be made, for example, of the systems of plasmid or viral type; such 
viruses will, for example, be retroviruses (Temin, 1986), or AAVs (Carter, 1993). 

15 Among the nonviral vectors, preference is given to naked polynucleotides such 

as naked DNA or naked RNA according to the technique developed by the company 
VICAL, yeast artificial chromosomes (YAC) for expression in yeast, mouse artificial 
chromosomes (MAC) for expression in murine cells and, preferably, human artificial 
chromosomes (HAC) for expression in human cells. 

20 Such vectors will be prepared according to the methods commonly used by 

those skilled in the art, and the clones resulting therefrom can be introduced into a 
suitable host using standard methods, such as for example lipofection, electroporation 
or heat shock. 

The invention also comprises the host cells, in particular the eukaryotic and 
25 prokaryotic cells, transformed with the vectors according to the invention, and also the 

transgenic animals, except humans, comprising one of said transformed cells 

according to the invention. 

Among the cells which can be used for these purposes, mention may of course 

be made of bacterial cells (Olins and Lee, 1993), but also yeast cells (Buckholz, 1993), 
30 as well as animal cells, in particular cultures of mammalian cells (Edwards and Aruffo, 

1993), and especially Chinese hamster ovary (CHO) cells, but also insect cells in which 

it is possible to use processes implementing baculoviruses, for example (Luckow, 

1993). A preferred cellular host for expressing the proteins of the invention consists of 

CHO cells. 



13 

Among the mammals according to the invention, preference will be given to 
animals such as mice, rats or rabbits, expressing a polypeptide according to the 
invention. 

Among the mammals according to the invention, preference will also be given to 
5 those comprising a transformed cell characterized in that the sequence of at least one 
of the two alleles of the SPG4 gene contains at least one of the mutations the position 
and the nature of which are identified in Table 5 or identified using a method according 
to the present invention. 

Among the mammals according to the invention, preference will also be given to 
10 animals such as mice, rats or rabbits, characterized in that the gene encoding spastin 
according to the invention is not functional or is knocked out. 

Among the animal models more particularly advantageous herein, there are, in 
particular: 

- the transgenic animals having, at least in one of their two allelic sequences of the 
15 SPG4 gene, at least one of the mutations the position and nature of which are 
identified in Table 5 or identified using a method according to the present invention. 
These transgenic animals are obtained, for example, by homologous recombination 
on embryonic stem cells, transfer of these stem cells to embryos, selection of the 
chimeras affected in the reproductive lines, and growth of said chimeras; 
20 - the transgenic animals (preferably mice) overexpressing the SPG4 gene into which 
one of said mutations according to the invention may be introduced. The mice are 
obtained, for example, by transfection of a copy of this gene under the control of a 
strong promoter which is ubiquitous in nature or selective for a tissue type, or after 
viral transcription; 

25 - the transgenic animals (preferably mice) made deficient for the SPG4 gene according 
to the invention by inactivation using the LOXP/CRE recombinase system (Rohlmann 
et al., 1996) or any other system for inactivating the expression of this gene. 

The cells and mammals according to the invention can be used in a method for 
producing a polypeptide according to the invention, as described below, and can also 
30 be used as a model for analysis and for DNA (genomic or cDNA) library screening. 

The transformed cells or mammals as described above can thus be used as 
models in order to study the interactions between the polypeptides according to the 
invention, and chemical or protein compounds, which are involved directly or indirectly 
in the activities of the polypeptides according to the invention, this being in order to 
35 study the various mechanisms and interactions which come into play. 
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They can especially be used for selecting products which interact with the 
polypeptides according to the invention, in particular human spastin of sequence SEQ 
ID No. 3 or the variants thereof according to the invention, as a cofactor or as an 
inhibitor, in particular a competitive inhibitor, or which have agonist or antagonist 
5 activity for the activity of the polypeptides according to the invention. Preferably, said 
transformed cells or transgenic animals will be used as a model which, in particular, 
enables the selection of products which make it possible to combat the pathology 
linked to the SPG4 gene mentioned above. 

The invention also relates to the use of a cell, of a mammal or of a polypeptide 
10 according to the invention for screening a chemical or biochemical compound which 
can interact directly or indirectly with the polypeptides according to the invention, 
and/or which is capable of modulating the expression or the activity of these 
polypeptides. 

The invention also relates to the use of a nucleic acid sequence according to 
15 the invention for synthesizing recombinant polypeptides. 

The method for producing a polypeptide of the invention in recombinant form is, 
itself, included in the present invention, and is characterized in that the transformed 
cells, in particular the cells or mammals of the present invention, are cultured under 
conditions which allow the expression of a recombinant polypeptide encoded by a 
20 nucleic acid sequence according to the invention, and in that said recombinant 
polypeptide is recovered. 

The recombinant polypeptides, characterized in that they can be obtained using 
said production method, also form part of the invention. 

The recombinant polypeptides obtained as indicated above can be in both 
25 glycosylated and nonglycosylated form and may or may not have the natural tertiary 
structure. 

These polypeptides can be produced based on the nucleic acid sequences 
defined above, according to the techniques for producing recombinant polypeptides 
known to those skilled in the art. In this case, the nucleic acid sequence used is placed 
30 under the control of signals which allow its expression in a cellular host. 

An effective system for producing a recombinant polypeptide requires a vector 
and a host cell according to the invention. 

These cells can be obtained by introducing into host cells a nucleotide 
sequences inserted into a vector as defined above, and then culturing said cells under 
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conditions which allow the replication and/or expression of the transfected nucleotide 
sequence. 

The processes for purifying a recombinant polypeptide which are used are 
known to those skilled in the art. The recombinant polypeptide can be purified from cell 
5 lyzates and extracts and/or from the culture medium supernatant, with methods used 
individually or in combination, such as fractionation, chromotography methods, 
immunoaffinity techniques using specific monoclonal or polyclonal antibodies, etc. 

The polypeptides according to the present invention can be obtained by 
chemical synthesis, this using one of the many known peptide syntheses, for example 
10 the techniques which implement solid phases or techniques which use partial solid 
phases, by condensation of fragments or by conventional synthesis in solution. 

The solid-phase synthesis technique is well known to those skilled in the art. 
See in particular Stewart et al. (1984) and Bodansky (1984). 

The polypeptides which are obtained by chemical synthesis and which can 
15 include corresponding unnatural amino acids are also included in the invention. 

The mono- or polyclonal antibodies or their fragments, chimeric antibodies or 
immunoconjugates, characterized in that they are capable of specifically recognizing a 
polypeptide according to the invention, form part of the invention. 

Specific polyclonal antibodies can be obtained from a serum of an animal 
20 immunized against the polypeptides according to the invention, in particular produced 
by genetic recombination or by peptide synthesis, according to conventional 
procedures. 

The advantage of antibodies which specifically recognize certain polypeptides, 
variants or immunogenic fragments thereof, according to the invention, will in particular 
25 be noted. 

The specific monoclonal antibodies can be obtained according to the 
conventional hybridoma culture method described by Kohler and Milstein, 1975. 

The antibodies according to the invention are, for example, chimeric antibodies, 
humanized antibodies, or Fab or F(ab') 2 fragments. They can also be in the form of 
30 labelled antibodies or immunoconjugates in order to obtain a detectable and/or 
quantifiable signal. 

The invention also relates to methods for detecting and/or purifying a 
polypeptide according to the invention, characterized in that they use an antibody 
according to the invention. 
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The invention also comprises purified polypeptides, characterized in that they 
are obtained using a method according to the invention. 

Moreover, besides their use for purifying the polypeptides, the antibodies of the 
invention, in particular the monoclonal antibodies, can also be used for detecting these 
5 polypeptides in a biological sample. 

They thus constitute a means of immunocytochemically or immuno- 
histochemically analyzing the expression of the polypeptides according to the 
invention, in particular the polypeptide of sequence SEQ ID No. 3 or a variant thereof, 
on specific tissue sections, for example by immunofluorescence or gold labelling, or 
10 with an enzymatic immunoconjugates. 

They may make it possible, in particular, to demonstrate abnormal expression 
of these polypeptides in the biological samples or tissues, which makes them useful for 
monitoring the progression of the disease and the molecular diagnosis. 

More generally, the antibodies of the invention can be advantageously used in 
15 any situation in which the expression of a normal or mutated polypeptide according to 
the invention must be observed. 

The methods for determining allelic variability, a mutation, a deletion, a loss of 
heterozygosity or any genetic abnormality of the SPG4 gene, according to the 
invention, characterized in that they use a nucleic acid sequence or an antibody 
20 according to the invention, also form part of the invention. 

The present invention thus comprises a method for genotypic diagnosis of the 
pathology associated with the SPG4 gene, characterized in that a nucleic acid 
sequence according to the invention is used. 

Preferably, the invention relates to a method for genotypic diagnosis of the 
25 disease associated with the presence of at least one mutation on a sequence of the 
SPG4 gene, using a biological sample from a patient, characterized in that it includes 
the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 
30 b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to the invention; 
c) analysis of the amplification products obtained and comparison of their sequence 
with the corresponding normal sequence of the SPG4 gene. 

The invention also comprises a method for diagnosing the disease associated 
35 with abnormal expression of a polypeptide encoded by the SPG4 gene, in particular the 
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polypeptide of sequence SEQ ID No. 3, characterized in that one or more antibodies 
according to the invention is (are) brought into contact with the biological material to be 
tested, under conditions which allow the possible formation of specific immunological 
complexes between said polypeptide and said antibody or antibodies, and in that the 
5 immunological complexes possibly formed are detected and/or quantified. 

These methods are, for example, directed toward the methods for diagnosis, in 
particular antenatal diagnosis, of AD-HSP associated with the presence of a mutation 
in the SPG4 gene, according to the invention, by determining, using a biological 
sample from the patient, the presence of mutations in at least one of the sequences 
10 described above. The nucleic acid sequences analyzed may equally be genomic DNA, 
cDNAormRNA. 

Nucleic acids or antibodies based on the present invention may also be used to 
enable positive diagnosis in a patient or presymptomatic diagnosis in an individual at 
risk, in particular an individual with a family history of the disease. 

15 There are, of course, a great number of methods which make it possible to 

demonstrate a mutation in a gene with respect to the wild-type gene. They can 
essentially be divided into two main categories. The first type of method is that in which 
the presence of a mutation is detected by comparing the mutated sequence with the 
corresponding wild-type sequence, and the second type is that in which the presence 

20 of the mutation is detected indirectly, for example through evidence of mismatches due 
to the presence of the mutation. 

These methods can use the probes and primers of the present invention which 
have been described. They are generally purified nucleic acid hybridization sequences 
comprising at least 15 nucleotides, preferably 20, 25 or 30 nucleotides, characterized in 

25 that they can hybridize specifically with a nucleic acid sequence according to the 
invention. 

Preferably, the specific hybridization conditions are such as those defined 
above or in the examples. The length of these nucleic acid hybridization sequences 
can range from 15, 20 or 30 to 200 nucleotides, particularly from 20 to 50 nucleotides. 
30 Among the methods for determining allelic variability, a mutation, a deletion, a 

loss of heterozygocity or a genetic abnormality, preference is given to the methods 
comprising at least one so-called PCR (polymerase chain reaction) or PCR-like 
amplification step for the target sequence according to the invention likely to have an 
abnormality, using a pair of primers having nucleotide sequences according to the 
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invention. The amplified products may be treated with a suitable restriction enzyme 
before carrying out the detection and assaying of the product targeted. 

The mutations of the SPG4 gene according to the invention may be responsible 
for various modifications of the translation product thereof, these modifications possibly 
5 being used for a diagnostic approach. Specifically, the antigenicity modifications linked 
to these mutations may allow the development of specific antibodies. The mutated 
gene product can be distinguished using these methods. All these modifications can be 
employed in a diagnostic approach, using several well-known methods based on the 
use of mono- or polyclonal antibodies which recognize the normal polypeptide or 

10 mutated variants, such as for example by RIA or by ELISA. 

In another aspect, the invention comprises a method for selecting a chemical or 
biochemical compound capable of preventing and/or treating AD-HSP associated with 
the SPG4 gene, characterized in that a nucleic acid sequence according to the 
invention, a polypeptide according to the invention, a vector according to the invention, 

15 a cell according to the invention, a mammal according to the invention or an antibody 
according to the invention is used. 

The methods for selecting chemical or biochemical compounds capable of 
interacting directly or indirectly with polypeptides according to the invention or with the 
nucleic acids according to the invention, and/or making it possible to modulate the 

20 expression or the activity of these polypeptides, characterized in that they comprise 
bringing a polypeptide according to the invention, a transformed cell according to the 
invention or a mammal according to the invention into contact with a candidate 
compound, and detecting a modification of the activity of said polypeptide, are also 
included in the invention. 

25 For example, but without being limited thereto, mention may be made of a 

method for identifying molecules capable of interacting with a polypeptide according to 
the invention, using a bacterial or yeast two hybrid system such as the Matchmaker 
Two Hybrid System 2, according to the instructions of the manual which is supplied 
with the Matchmaker Two Hybrid System 2 (Catalogue No. K1 604-1, Clontech). 

30 The nucleic acids encoding proteins which interact with the promoter and/or 

regulatory sequences of the SPG4 gene, according to the invention, can be screened 
and/or selected using a one hybrid system such as that described in the manual which 
is supplied with the Matchmaker One Hybrid System kit from Clontech (Catalogue No. 
K1603-). 
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In other aspect, the invention comprises the use of a nucleic acid or of a 
polypeptide according to the invention, of a vector according to the invention, of a cell 
according to the invention or of a mammal according to the invention, for studying the 
expression or the activity of the SPG4 gene. 
5 Other characteristics and advantages of the invention appear in the remainder 

of the description with the examples and figures, the legends of which are given 
hereinafter. 

LEGENDS OF THE FIGURES 

10 FIGURES 1A, 1B and 1C : Physical map of the SPG4 range and genomic organization 
of SPG4. 

FIGURE 1A : The 1.5 Mb candidate region is delimited by the D2S352 and 
D2S2347 genetic markers indicated in bold characters. The position of the polymorphic 
markers and other STSs is indicated in standard characters, whereas the position of 

15 the ESTs is indicated in italics. The BAC clones constituting the presequencing map 
are represented by rectangles, with the name shown above and the precise size of the 
clone, if it could be determined, shown below. The name of the BACs A, B, C, etc. is 
followed by brackets containing the name of the clone preceded by a "b" if the clone is 
derived from the BACs library CITB_978_SKB, or by a "B" if it originates from the 

20 library RPCM1. 

FIGURE 1B : Schematic representation of the SPG4 gene which overlaps 
BACs D (b336P14) and G (B563N4). The exons are shown as black rectangles with 
their name above. 

FIGURE 1C : The five mutations identified in seven SPG4 locus-linked AD-HSP 
25 families are positioned in exons 7, 11 and 13 and in the splice acceptor site of intron 
15. 

FIGURE 2 : Nucleic acid and protein sequence of the SPG4 cDNA of spastin. 

The 17 vertical bars with a number located below represent the junctions 
between the various exons. The ATG initiator codon is located at nt position 126-128 

30 and the STOP codon for termination is located at nt position 1974-1976. Five of the 
mutations identified to date, including the loss of exon 16, are indicated in italics 
(nt 1210, nt 1468, nt 1520, nt 1620 and for the loss of exon 16: nt 1813-1853). The 
polyadenylation site is in italics and underlined. The putative nuclear localization signal 
(NLS), RGKKK, and also the three conserved domains predicted by the analysis in the 

35 ProDom database are located at aa positions 7-11 (NLS), 342-409 (domain 92), 
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411-509 (domain 179) and 512-599 (domain 6226), respectively. The four motifs 
predicted by the sequence comparison in the Prosite database are: two "leucine 
zipper" motifs at aa positions 50-78 and 508-529, the ATP binding site (or Walker A 
motif) at aa positions 382-389 and the "helix-loop-helix" dimerization domain at aa 
5 positions 478-486. The Walker A and B motifs, "GPPGNGKT" and "IIFIDE", and also 
the AAA minimum consensus [lacuna] are underlined. 

FIGURES 3A, 3B and 3C : Characterization of a splice site mutation in the affected 
individuals of three SPG4 locus-linked AD-HPS families. 

FIGURE 3A : PCR amplification of fragment IV of the SPG4 cDNA using 
10 lymphoblast cDNA: well M, size marker VII (Boehringer); well 1, unaffected member of 
family 2992; well 2, patient of family 2992; well 3, unaffected member of family 5330; 
well 4, patient of family 5330; well 5, patient of family 5226; well 6, negative control 
(human genomic DNA). 

FIGURE 3B : Sequence graph for the mutation of the splice acceptor site of 
15 intron 15. 

Genomic sequence of the control individual above and of a patient of family 
2992 below. The asterisk at nt position 1813^1 indicates an A->C polymorphism which 
affects a nonconserved nucleotide of the splice acceptor site of intron 15 in the patient. 
FIGURES 4A and 4B : Spastin homologies. 

20 The conserved identical residues are highlighted in blue and yellow. 

FIGURE 4A : Multiple alignment created by CLUSTAL W of eight proteins 
derived from various organisms and having strong sequence homology with human 
spastin and murine spastin (SEQ ID No. 73). 

FIGURE 4B : Alignement by CLUSTAL W of the yeast metalloproteases AFG3, 

25 RCA1 and YME1, and of human plaraplegin and spastin. 

FIGURE 5: Alignment by BLASTN of the nucleic acid sequences of the SPG4 cDNA 
and of its mouse ortholog Spg4 (SEQ ID No. 72). The polyadenylation site of the 
murine cDNA is underlined and in italics. The STOP codon is located at nt position 
1515-1517 in the murine cDNA and at nt position 1974-1976 in the human cDNA. 

30 FIGURES 6A, 6B and 6C : PCR analysis of the expression of SPG4 and of its murine 
ortholog Spg4. 

FIGURE 6A : Collection of cDNA originating from multiple mouse tissues. 
Well M, size marker V (Boehringer); well 1, heart, well 2, brain; well 3, spleen; 
well 4, lung; well 5, liver; well 6, skeletal muscle; well 7, kidney; well 8, testicle; well 9, 
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E7 7-day embryo; well 10, E11 11 -day embryo; well 11, E15 15-day embryo; well 12, 
E17 17-day embryo; well 13, negative control (mouse genomic DNA). 

FIGURE 6B : Collection of cDNA originating from multiple human tissues. 

Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 
well 4, liver; well 5, lung; well 6, pancreas; well 7, placenta; well 8, skeletal muscle, 
well 9, negative control (human genomic DNA); well 10, negative control (no DNA). 

FIGURE 6C : Collection of cDNA originating from multiple human foetal tissues. 

Well M, size marker VII (Boehringer); well 1, brain; well 2, heart; well 3, kidney; 
well 4, liver; well 5, lung; well 6, skeletal muscle; well 7, spleen; well 8, thymus; well 9, 
negative control (human genomic DNA); well 10, negative control (no DNA). 

EXAMPLES 

Example 1 : Materials and methods 

1) Subcloning and sequencing of the candidate region 

Twelve BACs originating from two human genomic libraries, CITB_978_SKB 
(sold by Research Genetics) and RPCI-11 (18), and covering the SPG4 range, were 
selected to be sequenced (Hazan et al., in press Genomics). 40 /vg of the DNA of each 
BAC were partially digested with the CviJI restriction enzyme (CHIMERx) and 
separated by electrophoresis on 0.4% LMP agarose gel (FMC). DNA fractions, the 
sizes of which vary in the region of 3, 5 and 10 kb, were eluted with p-agarase 
(Biolabs) and ligated to a plasmid vector pBAM3, which had been digested with Smal 
and dephosphorylated, beforehand, in a ratio of 1 x insert per 5 x vector. 
Electrocompetent E. coli DH10B bacteria (GIBCO-BRL) were transformed with the 
various ligations, by electroporation. Approximately 1 000 to 1 500 subclones per BAC 
(8 to 10 equivalent genomes), consisting of 20% of clones with inserts at 10 kb, 40% of 
clones with inserts at 5 kb and 40% of clones with inserts at 3 kb, were isolated. The 
ends of the inserts of these clones were sequenced on a LICOR 4200 automatic 
sequencer. For each BAC, the sequences were assembled into a backbone consisting 
of several contigs, using the Phred and Phrap programs. The holes between each 
contig were sequenced with labelled dideoxynucleotides on an ABI 377 sequencer 
(PE-Applied Biosystems). The exons contained in these sequence contigs were 
predicted with the GRAIL II, GENSCAN, FGENEH and Genie computer programs. The 
sequences were also compared in the EMBL and GenBank nucleic acid and protein 
databases, with the BLASTN and BLASTX programs. The determination of the 
promoter sequences was carried out using the TSSG and TSSW computer programs. 
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The results of all these sequence analyses were visualized using the Genotator 
sequence annotation program. 
2) cDNA cloning 

The cDNA of the SPG4 gene was isolated through 5' and 3' RACE-PCR 
experiments on polyA+ RNAs of foetal brain, adult brain and adult liver, using the 
Marathon cDNA amplification kit (Clontech) according to the supplier's instructions. A 
first PCR followed by an internal PCR were carried out with various pairs of primers, 
the sequences of which are indicated in Table 1 hereinafter: 
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Table 1 

Primers used for the RACE-PCRs and the cDNA amplifications 



Primer Sequence (5'-3") 5' position pair/PCR product size 



SPA_5RACE5 


CGGAGCTCCTCTTGGCTGCCATG 


nt405 


SPA_5RACE6 


AGAAGCGCTGGCAGAGCCACACGAAG 


nt 372 


SPA_5RACE7 


AAGGCGACCAAACGCAGCAGCGCGAAG 


nt 331 


SPA_3RACE1 


AGGAGCAAGCTGTGGAATGGTATAAG 


nt 550 


SPA_3RACE2 


TGGTTATGGCCAAGGACCGCTTACAAC 


nt689 


SPA_3RACE3 


C AAACG G ACGTCTATAATG ACAGTAC 


nt 747 


SPA_3RACE4 


TTAGGAATGTGGACAGCAACCTTGC 


nt1075 


SPA_3RACE5 


CTTCTCTGAGGCCTGAGTTGTTCAC 


nt 1207 


SPA_3RACE6 


TGCTAGAATGACTGATGGATACTCAGG 


nt 1736 


SPA_3RACE7 


AGATGCAGCACTGGGTCCTATCCG 


nt1787 


SPA 3RACE8 


ATGAACGTCATCGGCTACAGAAACAG 


nt 2037 



SPA_Db TAGCAGTGGCTGCCGCCGT 


nt 45 


b+m 


655 bp 


SPA_Dm AAGCGGTCCTTGGCCATAAC 


nt 700 






SPA_Dc GGCGGCAGTGAGAGCTGTG 


nt 106 


c+n 


543 bp 


SPA_Dn CTAGCTCTTTCACACTGTTC 


nt649 






SPA_Ad AACAGGCCTTCGAGTACATC 


nt 487 


d+n 


746 bp 


SPA_Am CTGTGAACAACTCAGGCCTC 


nt 1233 






SPA_Ac ATGAGAAAGCAGGACAGAAG 


nt 532 






SPA_An TGCCAAGTCTTGACCAGC 


nt 1175 






SPA_Ba CTACAACTGCTACTCGTAAG 


nt1036 


a+m 


763 bp 


SPA_Bm CAGTGCTGCATCTTTTGCC 


nt 1799 






SPA_Bb TAGGAATGTGGACAGCAACC 


nt1076 






SPA_Bn AAAGCTGTTAGGTCACTTCC 


nt1780 






SPA_Ca TGGAGATGACAGAGTACTTG 


nt1550 


a+m 


766 bp 


SPA_Cm CTGGAATACTTTCATCTGC 


nt 2316 






SPA Cb ATGAGGCTGTTCTCAGGCG 


nt1603 







24 

The RACE-PCR products were cloned with the TA-cloning kit (Invitrogen) and 
the corresponding, clones were sequenced on an ABI 377 (PE-Applied Biosystems). 
The sequence of the SPG4 transcript was varified by sequencing PCR products 
amplified from a cDNA population originating from the lymphoblasts of 6 healthy 
5 individuals. 

3) Detection of mutations 

The total RNAs were extracted from lymphoblast lines of one affected individual 
per family studied and of 6 control individuals, using the RNA PLUSR kit (bioprobe 
System). The cDNA synthesis was carried out on 500 ng to 1 jjg of RNA, with 100 pmol 

10 of random hexameric primers (Pharmacia) and 200 units of Superscript II reverse 
transcriptase (Gibco BRL), under standard conditions. Four PCR amplifications, 
generating overlapping fragments which cover all of the SPG4 open reading frame, 
were carried out on the cDNAs of the patients and controls. Fragment I was amplified 
with the SPA_Db/SPA_Dm primers, and then by internal PCR with the 

15 SPA_Dc/SPA_Dn primers. Fragments II, III, and IV were amplified with the 
SPA_Ad/SPA_Am, SPA_Ba/SPA_Bm and SPA_Ca/SPA_Cm primers (cf. the 
sequences of these primers in Table 1), respectively. Each amplification was carried 
out in a total volume of 50 /vl containing 4 jj\ of cDNA (- 1/7th of the prep.), 20 pmol of 
each primer, 200 //M of dNTPs, 50 mM of KCI, 10 mM of Tris, pH 9, 1.5 mM MgCI 2 , 

20 0.1 % of triton X-100, 0.01 % of gelatin and 2.5 units of Taq polymerase (Cetus-PE). The 
PCR reactions were carried out according to the "hot start" process: the Taq 
polymerase is added at 92°C, after a first denaturation step of 5 min at 94°C. The 
samples are subsequently subjected to 35 cycles of denaturation (94°C for 40 sec), of 
hybridization (55°C for 50 sec, with the exception of fragment I: 58°C for 50 sec) and of 

25 elongation (72°C for 1 min), followed by a final elongation step (5 min at 72°C). The 
PCR products are sequenced on an ABI 377 automatic sequencer (PE-Applied 
Biosystems), with the SPA_Dc/SPA_Dn, SPA_Ac/SPA_An, SPA_Bb/SPA_Bn and 
SPA_Cb/SPA_Cm primers for fragments I, II, III and IV, respectively. 

The mutations were also sought or confirmed by sequencing the 17 predicted 

30 exons of the SPG4 gene in the patients and controls. Each exon was amplified with the 
corresponding "a+m w pair of primers (cf. Table 2 hereinafter), with the exception of 
exon 1 (gSPAex1c/gSPAex1m), and exons 10, 11 and 12 which were co-amplified with 
the gSPAex10a/gSPAex12m and gSPAex11a/gSPAex12m pairs of primers. 
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Table 2 



PCR primers for amplifying and sequencing the exons 



Exon 


Product size 


PCR program 


Primer Sequence (5'-3*) 


1 


1048 bp 


0 


gSPAexIc 
gSPAexl m 
gSPAexld 
gSPAexIn 


GTGAGCCGAACTGCACATTG 
CAAAGTCGACAGCTACAGTGC 
GGAACTGTAGTTGAGTGGGA 
AG ATGAGG CTCCG ACCTAC 


2 


624 bp 


3 


gSPAex2a 

gSPAex2m 

gSPAex2b 


AATGCCACACTTGTAATCTC 

TGTGAATATATCATAATTTGGG 

TACAGCAGTTCTCATGATG 


3 


812 bp 


1 


gSPAex3a 
gSPAex3m 


GACCAAATTGGTGCATGCATG 
ACATTTCCAATACATCCCAC 


4 


379 bp 


3 


gSPAex4a 

gSPAex4m 

gSPAex4n 


ATTTGTCATTTCACATGCAC 
TTAGAATGACTATACCTGAC 
TCAGGTTAAGTAAGACTC 


5 


830 bp 


4 


gSPAex5a 

gSPAexSm 

gSPAex5b 


TTCCTATCTACCTAGTGAC 
I I I I ATAGCAAGTTGCCCTG 
CCTATGAAGATCCTGGTAC 


6 


484 bp 


3 


gSPAex6a 
gSPAex6m 


TGTCATGATTCTAACAAGGG 
TCTATTTCACTCCTGACATG 


7 


420 bp 


2 


gSPAex7a 
gSPAex7m 


GTCATAGGGCTTAGGCTTC 
ATCATACTACCCACTTTTCC 


8 


647 bp 


3 


gSPAex8a 
gSPAex8m 


TGTTTGGGAAGATGCTACTG 
CTACTGAAGATAACGTACATG 


9 


1268 bp 


1 


gSPAex9a 

gSPAex9m 

gSPAex9b 


CATTGATTGCCATGTATTGG 

AGAAGGCCAGAAATACTCAG 

GTACTTAAATCGGTAAATATGG 


10l 


1061 bp 


4 


gSPAexlOa 


CTCAAGTCTTAGGAATGCAG 


11 1 






gSPAexlOb 


GCACTTAACCAGGCTGTATG 


12J 


551 bp 


3 


gSPAex11a 
gSPAex12m 


CTCAGATGACTCACATAGC 
CTTTACTAGACTAATTCTCCTG 



13 


1361 bp 


4 


* 

gSPAex13a 


CAGATTCAA6AAGACAGATC 








gSPAex13m 


GCAATAATTCACCACACTTG 








gSPAex13n 


GGTAGTTCTTGTTTCTGCTC 


14 


985 bp 


4 


gSPAex14a 


CAAGTGTGGTGAATTATTGC 








gSPAex14m 


GAGCTGAAAAGTATTCAGC 








gSPAex14n 


TGCAAAGGACATAGCCAGTG 


15 


1076 bp 


1 


gSPAex15a 


AGCCTCTGGAGATAGTATGC 








gSPAex15m 


CTAGAACAGGGGTCACAGTC 








gSPAex15n 


TTGGACTTCTTAAACTTC 


16 


A ACS A Utn 

1404 bp 


A 

4 


gSPAex16a 


GCAGTATG CAAG AAATTGAAC 








gSPAex16m 


GGCCTGTAATTTTCTTCTG 








gSPAex16b 


GTACTGAATAGATACATGTAG 


17 


445 bp 


3 


gSPAex17a 


GTGTAG C AG ATCAAC ATAG 








gSPAex17m 


CATCTTCAAGTTTGGTGCAC 



Other than for exon 1 , which is amplified using the Advantage GC genomic 
PCR kit (Clontech) according to the supplier's instructions, four slightly different PCR 
programs (1, 2, 3 and 4) were used to amplify the SPG4 exons (see Table 2). The 
5 amplifications were all carried out in a volume of 50 jj\ containing 100 ng of genomic 
DNA, 50 pmol of each primer, 250 //M pf dNTPs, 1X Takara buffer and 1 unit of Takara 
La Taq Taq polymerase (Shuzo Co.). The PCR reactions were carried out according to 
the "hot start" process: the Taq polymerase is added at 94°C, after a first denaturation 
step of 5 min at 96°C. The samples are subsequently subjected to 30 cycles of 

10 denaturation (94°C for 40 sec), of hybridization (prog. 1: 60°C for 50 sec; prog. 2: 58°C 
for 50 sec, prog. 3 and 4: 55°C for 50 sec) and of elongation (prog. 1 and 4: 72°C for 
1 min, prog. 2 and 3: 72°C for 40 sec), followed by a final elongation step (10 min at 
72°C). The sequencing of these PCR products was carried out on an ABI 377 
sequencer (PE-Applied Biosystems), using either the PCR primers or the internal 

15 primers termed "b" and M n" (see Table 2). 
4) Characterization of SPG4 

The cDNA clones 977312 (EST AA560327) and 568234 (EST AA1 07866) 
derived from the mouse blastocyst and E8 embryo cDNA libraries, which both 
correspond to the murine ortholog of SPG4, were isolated using the IMAGE consortium 
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and sequenced in the laboratory on an ABI 377 sequencer (PE-Applied Biosystems). In 
order to analyze the expression profile of SPG4 and of its murine ortholog Spg4, the 
collections of cDNA from various foetal and adult human tissues, and also from mouse 
tissues (MTC panels, Clontech), were tested by PGR according to the supplier's 
5 protocol, with the SPA_Ca/SPA_Cm pair of primers for the human cDNAs and the 
SPA_Ca/spam (spam: 5'-ACCGAAGTCAAGAGCCTATC-3') pair for the mouse 
cDNAs. The PCR conditions are those used for amplifying SPG4 from lymphoblast line 
cDNA (cf. § Detection of mutations), except that these samples were subjected to 
32 cycles for the cDNAs derived from adult human tissues and from mouse tissues, 
10 and to 28 cycles for the cDNAs derived from foetal tissues. The amplification products 
migrated by electrophoresis on 2% agarose gels. 

5) Histological analysis of a muscle biopsy from a patient 

The histological and histo-enzymatic analyses were carried out on a muscle 
biopsy from a patient derived from an SPG4 locus-linked family according to the 
15 standard techniques described in Casari et al. (17). 

6) Accession numbers in the public databases 

The SPG4 (or SPAST) cDNA and the deduced protein sequence, 
GenBank/EMBL AJ246001; the incomplete Spg4 cDNA clone, GenBank/EMBL 
AJ246002; the SPG4 (or SPAST) gene, GenBank/EMBL AJ246003. 

20 Example 2 : Analysis of the sequence of the SPG4 range 

The analysis of the recombination events made it possible to reduce the SPG4 
candidate region to a genetic range of 0 cM between the D2S352 and D2S2347 
markers (19, 20). A presequencing map of the SPG4 range composed of 37 BACs was 
constructed (Hazan et al., in press in Genomics); the candidate region covers a 

25 physical distance of approximately of 1.5 Mb. Twelve overlapping BACs, stretching 
over the SPG4 region, with the exception of a single 4 kb hole between clones A and 
E, were selected to be sequenced (fig. 1 A). Seven of these BACs (A, B, C, D, E, F and 
G), covering approximately 70% of the region of interest, have already been 
sequenced. The sequences of these 7 BACs were compared with those of the nucleic 

30 acid and protein databases, and analyzed with four exon prediction programs. These 
preliminary sequence analyses made it possible to reveal 14 potential transcription 
units, including three corresponding to the genes encoding xanthine dehydrogenase, 
steroid 5a-reductase 2 and a TGFp-binding protein. Of the 14 genes detected by the 
sequence analysis, 9 had been previously identified in the EST (for "Expressed 

35 Sequence Tag") databases and located in the SPG4 range (Hazan et al., in press in 



Genomics); the 5 remaining genes could only be identified by sequencing the 
candidate region. One of these 5 novel genes showed homology in 3' of its coding 
region, with the genes encoding the AAA protein family (16). More thorough sequence 
analyses showed that this gene, named SPG4 (or SPAST), was composed of 17 exons 
5 and extended over a region of approximately 90 kb, covered by two adjacent BAC 
clones, D and G (cf. fig. 1B). The first three predicted exons of this gene were identified 
in BAC D, by two of the four exon prediction programs used, GRAIL II and GENSCAN; 
they show strong homology with a mouse blastocyst EST, AA560327. The last 14 
exons are found in BAC G. The protein sequence deduced from exons 7 to 17 is 

10 significantly homologous to a subclass of the AAA family, which includes the Yta6p 
(21), TBP6 (21) and End 13 yeast proteins, and also the SKD1 mouse protein (22). 

Of the four exon prediction programs FGENEH appears to be the most reliable 
and the most powerful, enabling detection of most of the genes of this chromosomal 
region at 2p21-p22. This observation also applies to the SPG4 gene, for which 15 

15 exons could be demonstrated using this program, while only 4, 9 or 11 exons could be 
located using the Genie, GRAIL II and GENSCAN programs, respectively. The 
genomic organization of this gene (fig. 1B) could subsequently be confirmed by 
determining the sequence of the SPG4 cDNA. The intron/exon junctions are 
represented on table 3 hereinafter: the exon size ranges from 41 bp (exon 16) to 

20 1.410 kb (exon 17), that of the introns ranging from 140 bp (intron 11) to 23.247 kb 
(intron 1). 
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Example 3 : Identification of the SPG4 cDNA 

Several successive amplifications by 5' and 3* RACE-PCR were carried out on 
collections of adult liver and brain and foetal brain cDNA, in order to characterize the 
SPG4 transcript. All the 5' RACE-PCRs gave amplification products terminating at nt 
5 position 263 of the SPG4 cDNA (fig. 2), which was probably due to the rich GC content of 
the 5' region of the transcript (90% of GC in the 60 bp preceding nt position 263). Four 
overlapping PGR products, covering all of the coding region, were amplified from the 
cDNAs derived from the lymphoblasts of six control individuals, and entirely sequenced 
with the aim of verifying the sequence of the SPG4 transcript. Aligning the sequences of 

10 all the PCR and RACE-PCR products made it possible to reconstitute a 3263 bp 
sequence comprising a 1848 bp open reading frame preceded by a 125 bp untranslated 5' 
region (5' UTR for "5 1 UnTranslated Region") and followed by 1290 bp 3' UTR region 
including a polyadenylation site between nt positions 3227-3232, ~ 35 bp upstream of the 
poIyA tail (fig. 2). Comparing the sequence of the SPG4 cDNA with the EST databanks 

15 made it possible to detect significant homology with 6 human ESTs, including 
EST N47973 which contains a more extended 3' noncoding region (+ 180 bp) comprising 
a second polyadenylation site. The translation initiation site was identified by the presence 
of a Kosak consensus sequence (CTGTGAatgA) defined as a "suitable context" for 
translation initiation given that a purine is located 3 nt upstream of the initiator ATG, itself 

20 preceded by a STOP codon. The 3263 bp cDNA sequence is identical to the transcribed 
sequence deduced from the 17 exons of the SPG4 gene. The analysis of the sequence of 
the 5* region using the TSSG and TSSW computer programs suggests the presence of a 
promoter sequence of the TATA box type located 43 bp upstream of nt position 1 of 
exon 1 . 

25 Example 4 : Mutations in the SPG4 gene 

Heterozygous mutations were sought in the SPG4 cDNA originating from 
lymphoblasts of 14 patients derived from SPG4 locus-linked families (1 affected individual 
per family). Four overlapping PCR fragments, I, II, III and IV, covering the open reading 
frame of the SPG4 cDNA, were amplified and sequenced in the 14 patients, and also in 6 

30 healthy control individuals. The agarose gel electrophoresis of PCR fragment IV showed 
three bands of equal intensity in 3 patients from families 2992, 5226 and 5330 originating 
from the same region of Switzerland, which would suggest a microdeletion or a mutation 
of a splice site; the two additional bands were not present in 2 healthy individuals derived 
from families 2992 and 5330 (fig. 3A). The genomic sequence of exon 16 revealed a 

35 heterozygous A->G mutation of the splice acceptor site (AG) of intron 15 in the affected 
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individuals of these three families (fig. 3B); this mutation engenders the loss of exon 16, 
followed by a reading frame shift in the abnormal transcript. None of the healthy members, 
including husbands and wives, carry this mutation of the splice site. The identification of 
the same mutation in all the affected members of these three Swiss families demonstrates 
the existence of a common ancestor, which had probably been suggested by the study of 
the haplotypes. 

Three point mutations, 1210C->G, 1468G->A and 1620C->T, which introduced 
amino acid substitutions into the protein sequence (S362C, C448Y and R499C), were 
respectively revealed by sequencing PCR fragments III and IV in the affected individuals 
of families 624, 4014 and 618. These three substitutions all involve a cysteine residue, 
inducing the loss or insertion of a cysteine in the protein sequence. A 1 bp deletion, 
1520delT, which creates the appearance of a STOP codon inducing a truncated protein 
composed of 465 amino acids (aa), was detected in the affected individuals of family A. 
Norie of the five mutations summarized in table 4 hereinafter was found in the control 
individuals tested, whether they belong to the healthy siblings or to the spouses of the 
seven families analyzed herein. These five mutations significantly affect the protein 
sequence in a very conserved domain, or AAA cassette (23), which is composed of 
several protein motifs presumed to be responsible for the ATPase activity in all the 
members of the AAA family. 
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In addition to these five mutations described above, searches for heterozygous 
mutations, carried out on patients suffering from AD-HSP derived from 36 other families, 
made it possible to reveal 34 other mutations which modified or were likely to modify the 
product of expression of the SPG4 gene. 

The characteristics of these 34 other mutations are summarized in table 5 
hereinafter, into which the first five mutations mentioned above have also been inserted. 
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Table 5 

Mutations in SPG4 in the patients suffering from AD-HSP 



Family 



Location 



Mutation 



Amino acid change 



Consequence 



624 

6958 

214 

1002 

027 

019 

4014 

148 

618 

636 

627 



exon 7 
exon 8 
exon 8 
exon 8 
exon 8 
exon 10 
exon 1 1 
exon 1 1 
exon 13 
exon 15 
exon 15 



1210 ChG 
1233 G H>A 
1267 ThG 
1283 Th^G 
1288 Ah>G 
1401 C h>G 
1468 G h-»A 
1504 G i— >T 
1620Ch->T 
1788 G h>A 
1792 C h->T 



S362C 


missense 


G370R 


missense 


F381C 


missense 


N386K 


missense 


K388R 


missense 


L426V 


missense 


C448Y 


missense 


R460L 


missense 


R499C 


missense 


D555N 


missense 


A556V 


missense 



2971 

3655 

1010 

3938 

6922 

616 

605 



exon 3 
exon 5 
exon 5 
exon 5 
exon 10 
exon 10 
exon 15 



702 C h>T 
873 A h->T 
907 C h>A 
932 C h>G 
1416 ChT 
1416 ChT 
1809 Ch^T 



Q193STOP 
K229STOP 
S261STOP 
Y269STOP 
R431STOP 
R431STOP 
R562STOP 



nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 



030 

615 

042 

032 

189 

3686 

625 

A 

115 

3266 

149 

645 



exon 2 
exon 5 
exon 5 
exon 5 
exon 9 
exon 9 
exon 9 
exon 1 1 
exon 12 
exon 13 
exon 14 
exon 14 



578-579insA 

852dell 1 

882-883insA 

906delT 

1299delG 

1340del5 

1340del5 

1520delT 

1574delGG 

1634del22 

1684-1685insTT 

1685del4 



PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 
PTC 



+ 2 aa 
+ 18 aa 
+ 12 aa 
+ 17 aa 
+ 3 aa 
+ 35 aa 
+ 35 aa 
+ 7aa 
+ 2 aa 
+ 18 aa 
+ 9aa 
+ 7aa 



shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 
shift + 



nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 
nonsense 



029 

162 

125 

143 

1620 

1006 

1605 

1012 

1626 

2992 

5226 

5330 

1611 



intron 4 
intron 6 
intron 7 
intron 8 
intron 1 1 
intron 1 1 
intron 13 
intron 13 
intron 15 
intron 15 
intron 15 
intron 15 
intron 16 



808-2 a h->g 
1129+2ti->g 
1223+1 g H>t 
1299+1 gHa 
1538+5 gna 
1538+3 del4 
1661+1 gh->t 
1662-2 a H>t 
1812+1 gh>a 
1813-2 a h->g 
1813-2 a h^g 
1813-2 a h->g 
1853+1 gh»a 



? 
? 
? 
? 

(PTC + 6 aa) 

? 
? 
? 
? 

A aa564 M> aa576 (PTC+7 aa) 
A aa564 h-> aa576 (PTC+7 aa) 
A aa564 h-> aa576 (PTC+7 aa) 



splice site mutation 
splice site mutation 
splice site mutation 
splice site mutation 
loss of exon 11 + shift 
splice site mutation 
splice site mutation 
splice site mutation 
splice site mutation 
loss of exon 16 + shift 
loss of exon 16 + shift 
loss of exon 16 + shift 
splice site mutation 



a The nt positions refer to the sequence of the SPG4 cDNA. b The aa positions refer to the spastin sequence. 
The exon bases are indicated in upper case, those of the introns in lower case. PTC+n aa - "premature 
termination codon" at n amino acids downstream of the mutation. 
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Example 5 : Analysis of the protein sequence of spastin 

The open reading frame of SPG4 encodes a 616 aa protein which we have named 
spastin and the molecular weight of which is approximately 67.2 kDaltons (kD). The 
comparison of this amino acid sequence in the protein databases, using the BLAST 
5 programs, made it possible to reveal a region of strong homology with several members of 
the AAA family, at the C-terminal end of spastin. The "typical" motifs of the AAA family, 
encompassed in the AAA cassette, are located between aa positions 342 and 599 (see 
fig. 2) according to the sequence comparisons in the ProDom and Prosite protein domain 
databases. The three conserved typical domains, including the Walker A and B motifs and 
10 also the minimum consensus motif of the AAA proteins are located in the AAA cassette at 
aa positions 382-389, 437-442 and 480-498, respectively, (fig. 2). The Walker A motif, 
"GPPGNGKT", also called p-loop, which corresponds to the ATP-binding domain, and the 
B motif, "IIFIDE", are very conserved among all the members of the AAA family, including 
spastin. 

15 The comparison of the AAA cassettes present in 150 proteins of this ATPase 

family, derived from organisms which are very far apart in evolution made it possible to 
classify this set of proteins into several subgroups, as a function of the number of AAA 
cassettes identified (1 or 2) and of the sequence homologies between these various 
cassettes (23). Among all the proteins of the AAA family, spastin shows stronger 

20 homology with a particular subclass of the AAAs, and more specifically with the following 
proteins, most of which were identified through the complete sequencing of the genome of 
the organism in question: two proteins of Caenorhabditis elegans, 016299 and Q18128; 
two subunits of the 26S proteasome of Saccharomyces cerevisiae, Yta6p (Q02845) and 
TBP6 (P40328) (21); a subunit of the proteasome of Schizosaccharomyces pombe 

25 (043078); the SAP1 (P39955) and END13 (P52917) proteins of S. cerevisiae and the 
murine SKD1 protein (P46467) (22). The multiple alignment of these 8 proteins with 
spastin is represented in fig. 4A. Of the 257 amino acids encompassing the AAA cassette 
(aa positions 342-599), spastin shows 52%, 51% and 50% sequence identity with the 
Yta6p (Q02845) yeast protein, the 016299 nematode protein and the TBP6 (P40328) 

30' yeast protein, respectively. Similar results were obtained by analyzing the protein 
sequence of spastin in the ProDom database, which showed the existence of three 
domains of homology (named 92, 179 and 6226, and corresponding to aa positions 342- 
409, 41 1-509 and 512-599) found in the putative subunits of the 26S proteasome of yeast. 

i 

In addition, the members of this AAA subgroup most commonly contain motifs of the 
35 leucine-zipper type, two of which could be detected in the protein sequence of spastin at 
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aa positions 50-78 and 508-529, by analyzing the sequence in the Prosite database (see 
fig, 2). This analysis was also able to predict the presence of a dimerization motif of the 
helix-loop-helix type, located between aa positions 478 and 486. 

The comparison of the protein sequence of spastin with those of the mitochondrial 
5 metalloproteases, such as the AFG3, RCA1 and YME1 yeast proteins, and also 
paraplegin, which is implicated in a rare form of AR-HSP, shows that the homology 
between these five members of the AAA family is limited to the 257aa region 
encompassing the AAA cassette (fig. 4B). In this region, the sequence identity between 
spastin and paraplegin is only 29%, whereas paraplegin and the AFG3 yeast protein are 

10 57% identical over this same portion of the protein sequence. This sequence comparison 
suggests that spastin does not belong to the same AAA subgroup as paraplegin and other 
mitochondrial metalloproteases. In addition, the computer analysis of the spastin 
sequence using the PSORT li program, which makes it possible to predict the subcellular 
location of the proteins, appears to indicate that spastin is a nuclear protein. A possible 

15 nuclear localization signal (NLS), RGKKK, was revealed between aa positions 7 and 11, 
whereas no signal peptide characteristic of importation into mitochondria could be 
detected, unlike what had been observed for paraplegin. 
Example 6 : Expression profiles for SPG4 and for its murine ortholoq Spq4 

The comparison of the nucleic acid sequence of SPG4 in the EST databanks 

20 made it possible to detect several human, murine and rat ESTs showing strong homology 
with SPG4. The mouse blastocyst and E8 embryo cDNA clones corresponding to two of 
the murine ESTs, AA560327 and AA1 07866, were obtained from the IMAGE consortium 
and entirely sequenced. The assembly of the sequences of these cDNA clones made it 
possible to reconstitute a 1689 bp consensus sequence including a 1514 bp incomplete 

25 open reading frame. The comparison between the human SPG4 cDNA and this mouse 
cDNA showed that the murine transcript lacks approximately 460 bp at the 5' end, 
including the translation initiation codon. The mouse open reading frame is followed by a 
175 bp 3' noncoding region (3' UTR) containing a polyadenylation site located -20 bp 
upstream of the polyA tail (fig. 5). The nucleic acid sequence of SPG4 and the protein 

30 sequence of human spastin show 89% (between nt positions 460 and 1982) and 96% 
(between aa positions 113 and 616) identity, respectively, with the mouse cDNA and 
deduced protein sequences. This considerable degree of homology makes it possible to 
affirm that this mouse transcript corresponds to the murine ortholog of SPG4, which was 
therefore named Spg4. 



The hybridization of Northern blots comprising the mRNAs of various human and 
murine tissues (Clontech) with the SPG4 and Spg4 cDNA clones did not give any 
convincing results, except a very weak band corresponding to a 2.5 kb transcript in the 
mouse testicle after exposure for 10 days. Because of the low level of expression of this 
5 gene, the expression profiles for SPG4 and Spg4 were determined by PCR experiments 
on normalized collections of cDNA originating from various adult and foetal tissues (see 
fig. 6). The murine Spg4 gene is expressed ubiquitously in the adult tissues of mice, and 
also from the E7 stage to the E17 stage of mouse embryos (fig. 6A). Higher expression of 
Spg4 was detected in the liver, skeletal muscle and testicles, and also at the E15 stage of 

10 embryos. The early expression of Spg4 during embryonic development was confirmed by 
the presence of ESTs originating from blastocyst, E8 embryo and embryonic carcinoma 
cDNA libraries in the public EST databanks. The human SPG4 gene is, itself, also 
expressed ubiquitously in adult (fig. 6B) and foetal (fig. 6C) tissues, with perhaps more 
marked expression in foetal brain. 

15 Example 7 : No oxidative phosphorylation impairment in SPG4 locus-linked AD-HSP 

In order to determine whether spastin mutations induced an oxidative 
phosphorylation (OXPHOS) impairment in mitochondria, in the same way as had been 
observed for paraplegin, a muscle biopsy was performed on a patient from one of the 
SPG4 locus-linked AD-HSP families. The morphological and histo-enzymatic analyses of 

20 this muscle biopsy did not reveal any muscle fibres of the RRF (for "ragged red fibre") 
type, characteristic of OXPHOS impairments in mitochondria. The fact that all the muscle 
fibres appear to be normal, and also the prediction of a nuclear localization for spastin, 
seem to indicate that SPG4 locus-linked AD-HSP is not a mitochondrial disease of the 
OXPHOS type, unlike SPG7 locus-linked AR-HSP. 

25 

Using a positional cloning approach based on sequencing a 1.5 Mb region, we 
have identified the SPG4 (or SPAST) gene responsible for the most common form of 
AD-HSP, previously located on chromosomal bands 2p21-p22. Thirty nine mutations 
which modify or are likely to modify the gene product, named spastin, could be detected in 
30 ' the affected individuals from forty one families with AD-HSP showing a link to the SPG4 
locus. Spastin is a novel member of the AAA protein family, which appears to have a 
nuclear localization and which shows strong homology with the subunits of the 26S 
proteasome of yeast. Despite great homology restricted to a domain of 230 to 250 aa, 

i 

termed AAA cassette, the many members of this protein family can participate in very 
35 varied cellular mechanisms, such as the transport of proteins in vesicles, cell cycle 
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regulation, organelle biogenesis, i.e. control of transcription, etc. However, all these 
cellular mechanisms involve the assembly, the functioning or the degradation of protein 
complexes, which suggest that the members of the AAA family are so-called "chaperon" 
proteins. 
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CLAIMS 

1 . Purified or isolated nucleic acid of the SPG4 gene, characterized in that it 
comprises at least 15 consecutive nucleotides of a sequence chosen from the group 

5 comprising the sequence SEQ ID No. 1, the nucleic acid sequences which are 
homologues or variants of the nucleic acid of sequence SEQ ID No. 1, the sequence 
complementary thereto and the sequence of their corresponding RNA. 

2. Purified or isolated nucleic acid according to Claim 1 , characterized in that it 
comprises a sequence chosen from the group comprising the sequence SEQ ID No. 1, 

10 the sequence SEQ ID No. 2, the sequence SEQ ID No. 72, the nucleic acid sequences 
which are homologues or variants of the sequences SEQ ID No. 1, SEQ ID No. 2 or 
SEQ ID No. 72, the sequence complementary thereto and the sequence of their 
corresponding RNA. 

3. Purified or isolated nucleic acid according to Claim 1 or 2, characterized in 
15 that it comprises at least one mutation the position and the nature of which are identified 

in Table 5. 

4. Probe or primer, characterized in that it comprises a sequence of a nucleic 
acid according to one of Claims 1 to 3. 

5. Probe or primer according to Claim 4, characterized in that its sequence is 
20 chosen from the sequencs SEQ ID No. 4 to SEQ ID No. 71 . 

6. Splice acceptor or donor site, characterized in that its sequence is chosen 
from the sequences of splice acceptor or donor sites identified in Table 3. 

7. Method for screening cDNA or genomic DNA libraries, or for cloning 
isolated genomic or cDNA encoding spastin, characterized in that it uses a nucleic acid 

25 sequence according to one of Claims 1 to 6. 

8. Method according to Claim 7, for identifying the genomic or cDNA 
sequence of the SPG4 gene of mammals, in particular of mice. 

9. Method for identifying a, mutation carried by the human SPG4 gene, 
characterized in that it uses a nucleic acid sequence according to one of Claims 1 to 6. 

30 10. Method according to Claim 9, for identifying a mutation responsible for 

autosomal dominant hereditary spastic paraplegia. 

11. Method for identifying the nucleic acid sequences which promote and/or 

regulate the expression of the SPG4 gene, characterized in that it uses a nucleic acid 

sequence according to one of Claims 1 to 6. 
35 12. Nucleic acid identified using a method according to one of Claims 8 to 12. 
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13. Polypeptide encoded by a nucleic acid according to one of Claims 1 to 3 

and 12. 

14. Polypeptide according to Claim 13, characterized in that it comprises an 
amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3, the 

5 sequence SEQ ID No. 73, the sequences of polypeptides which are homologues and 
variants of the polypeptide of sequence SEQ ID No. 3 or SEQ ID No. 73, and the 
sequences of the fragments thereof of at least 10 consecutive amino acids. 

15. Polypeptide according to Claim 14, characterized in that it comprises an 
amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3 and 

10 the sequence SEQ ID No. 73, which sequences carrying at least one of the mutations the 
nature and location of which are identified in Table 5, and the sequences of the fragments 
thereof of at least 10 consecutive amino acids. 

16. Cloning and/or expression vector containing a nucleic acid sequence 
according to one of Claims 1 to 3, and 12. 

15 17. Vector according to Claim 16, characterized in that it includes the elements 

required for its expression in a host cell. 

18. Host cell transformed with a vector according to Claim 16 or 17. 

19. Mammal, except a human, characterized in that it comprises a cell 
according to Claim 18. 

20 20. Mammal, except a human, according to Claim 16, comprising a 

transformed cell, characterized in that the sequence of at least one of the two alleles of 
the SPG4 gene contains at least one of the mutations the position and nature of which are 
identified in Table 5 or identified using a method according to Claim 10 or 11. 

21. Use of a nucleic acid sequence according to one of Claims 4, 5 and 12, as 
25 a probe or primer, for detecting and/or amplifying nucleic acid sequences. 

22. Use of a nucleic acid sequence according to one of Claims 1 to 6, and 12, 
for screening a genomic or cDNA library. 

23. Use of a nucleic acid sequence according to one of Claims 1 to 3 and 12, 
for producing a recombinant or synthetic polypeptide. 

30' 24. Method for producing a recombinant polypeptide, characterized in that a 

transformed cell according to Claim 18 is cultured under conditions which allow the 
expression of said recombinant polypeptide, and in that said recombinant polypeptide is 
recovered. 

25. Polypeptide, characterized in that it is obtained using a method according to 
35 Claim 24. 
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26. Mono- or polyclonal antibodies or their fragments, chimeric antibodies or 
immunoconjugates, characterized in that they are capable of specifically recognizing a 
polypeptide according to one of Claims 13 to 15, and 25. 

27. Method for detecting and/or purifying a polypeptide according to one of 
5 Claims 13 to 15, and 25, characterized in that it uses an antibody according to Claim 26. 

28. Method for genotypic diagnosis of AD-HSP associated with the SPG4 gene, 
characterized in that a nucleic acid sequence according to one of Claims 1 to 6 and 12 is 
used. 

29. Method for genotypic diagnosis of AD-HSP associated with the presence of 
10 at least one mutation on a sequence of the SPG4 gene, using a biological sample from a 

patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 

b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
15 mutation, using primers according to either of Claims 4 and 5 or a nucleic acid 

according to Claim 12; 

c) analysis of the amplification products obtained and comparison of their sequence with 
the corresponding normal sequence of the SPG4 gene. 

30. Method for diagnosing AD-HSP associated with abnormal expression of a 
20 polypeptide encoded by the SPG4 gene, characterized in that one or more antibodies 

according to Claim 26 is (are) brought into contact with the biological material to be tested, 
under conditions which allow the possible formation of specific immunological complexes 
between said polypeptide and said antibody or antibodies, and in that the immunological 
complexes possibly formed are detected and/or quantified. 

25 31 . Method for selecting a chemical or biochemical compound which is capable 

of interacting directly or indirectly with a polypeptide according to one of Claims 13 to 15, 
and 25, or with a nucleic acid according to one of Claims 1 to 6, and 12, and/or which 
makes it possible to modulate the expression or the activity of these polypeptides, 
characterized in that it comprises bringing a nucleic acid sequence according to one of 

30 Claims 1 to 6, and 12, a polypeptide according to one of Claims 13 to 15, and 25, a vector 
according to either of Claims 16 and 17, a cell according to Claim 18, a mammal 
according to either of Claims 19 and 20 or an antibody according to Claim 26 into contact 
with a candidate compound, and detecting a modification of the activity of said 
polypeptide. 
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32. Use of a nucleic acid sequence according to one of Claims 1 to 6, and 12, 
of a polypeptide according to one of Claims 13 to 15, and 25, of a vector according to 
either of Claims 16 and 17, of a cell according to Claim 18, of a mammal according to 
either of Claims 19 and 20 or of an antibody according to Claim 26, for studying the 
expression or the activity of the SPG4 gene. 
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CLAIMS 

1 . Purified or isolated nucleic acid of the SPG4 gene, characterized in that it 
comprises at least 15 consecutive nucleotides of a sequence chosen from the group 

5 comprising the sequence SEQ ID No. 1, the nucleic acid sequences which are 
homologues or variants of the nucleic acid of sequence SEQ ID No. 1, the sequence 
complementary thereto and the sequence of their corresponding RNA. 

2. Purified or isolated nucleic acid according to Claim 1 , characterized in that it 
comprises a sequence chosen from the group comprising the sequence SEQ ID No. 1, 

10 the sequence SEQ ID No. 2, the sequence SEQ ID No. 72, the nucleic acid sequences 
which are homologues or variants of the sequences SEQ ID No. 1, SEQ ID No. 2 or 
SEQ ID No. 72, the sequence complementary thereto and the sequence of their 
corresponding RNA. 

3. Purified or isolated nucleic acid according to Claim 1 or 2, characterized in 
15 that it comprises a mutation corresponding to a natural polymorphism in humans. 

4. Probe or primer, characterized in that it comprises a sequence of a nucleic 
acid according to one of Claims 1 to 3. 

5. Probe or primer according to Claim 4, characterized in that its sequence is 
chosen from the sequencs SEQ ID No. 4 to SEQ ID No. 71 . 

20 6. Splice acceptor or donor site, characterized in that its sequence is chosen 

from the sequences SEQ ID No. 74 to SEQ ID No. 105. 

7. Method for screening cDNA or genomic DNA libraries, or for cloning 

isolated genomic or cDNA encoding spastin, characterized in that it uses a nucleic acid 

sequence according to one of Claims 1 to 6. 
25 8. Method according to Claim 7, for identifying the genomic or cDNA 

sequence of the SPG4 gene of mammals, in particular of mice. 

9. Method for identifying a mutation carried by the human SPG4 gene, 
characterized in that it uses a nucleic acid sequence according to one of Claims 1 to 6. 

10. Method according to Claim 9, for identifying a mutation responsible for 
30 autosomal dominant hereditary spastic paraplegia. 

11. Method for identifying the nucleic acid sequences which promote and/or 
regulate the expression of the SPG4 gene, characterized in that it uses a nucleic acid 
sequence according to one of Claims 1 to 6. 

12. Nucleic acid identified using a method according to one of Claims 8 to 1 1 . 
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13. Polypeptide encoded by a nucleic acid according to one of Claims 1 to 3 

and 12. 

14. Polypeptide according to Claim 13, characterized in that it comprises an 
amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3, the 

5 sequence SEQ ID No. 73, the sequences of polypeptides which are homologues and 
variants of the polypeptide of sequence SEQ ID No. 3 or SEQ ID No. 73, and the 
sequences of the fragments thereof of at least 1 0 consecutive amino acids. 

15. Polypeptide according to Claim 14, characterized in that it comprises an 
amino acid sequence chosen from the group comprising the sequence SEQ ID No. 3 and 

10 the sequence SEQ ID No. 73, which sequences carrying at least one of the mutations the 
nature and location of which are identified in Table 5, and the sequences of the fragments 
thereof of at least 10 consecutive amino acids. 

16. Cloning and/or expression vector containing a nucleic acid sequence 
according to one of Claims 1 to 3, and 12. 

15 17. Vector according to Claim 16, characterized in that it includes the elements 

required for its expression in a host cell. 

18. Host cell transformed with a vector according to Claim 16 or 17. 

19. Mammal, except a human, characterized in that it comprises a cell 
according to Claim 18. 

20 20. Mammal, except a human, according to Claim 19, comprising a 

transformed cell, characterized in that the sequence of at least one of the two alleles of 
the SPG4 gene contains at least one of the mutations the position and nature of which are 
identified in Table 5 or identified using a method according to Claim 9 or 10. 

21. Use of a nucleic acid sequence according to one of Claims 4, 5 and 12, as 
25 a probe or primer, for detecting and/or amplifying nucleic acid sequences. 

22. Use of a nucleic acid sequence according to one of Claims 1 to 6, and 12, 
for screening a genomic or cDNA library. 

23. Use of a nucleic acid sequence according to one of Claims 1 to 3 and 12, 
for producing a recombinant or synthetic polypeptide. 

30 ' 24. Method for producing a recombinant polypeptide, characterized in that a 

transformed cell according to Claim 18 is cultured under conditions which allow the 
expression of said recombinant polypeptide, and in that said recombinant polypeptide is 
recovered. 

i 

25. Polypeptide, characterized in that it is obtained using a method according to 
35 Claim 24. 
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26. Mono- or polyclonal antibodies or their fragments, chimeric antibodies or 
immunoconjugates, characterized in that they are capable of specifically recognizing a 
polypeptide according to one of Claims 13 to 15, and 25. 

27. Method for detecting and/or purifying a polypeptide according to one of 
Claims 13 to 15, and 25, characterized in that it uses an antibody according to Claim 26. 

28. Method for genotypic diagnosis of AD-HSP associated with the SPG4 gene, 
characterized in that a nucleic acid sequence according to one of Claims 1 to 6 and 12 is 
used. 

29. Method for genotypic diagnosis of AD-HSP associated with the presence of 
at least one mutation on a sequence of the SPG4 gene, using a biological sample from a 
patient, characterized in that it includes the following steps: 

a) where appropriate, isolation of the genomic DNA from the biological sample to be 
analyzed, or production of cDNA from the RNA of the biological sample; 

b) specific amplification of said DNA sequence of the SPG4 gene likely to contain a 
mutation, using primers according to either of Claims 4 and 5 or a nucleic acid 
according to Claim 12; 

c) analysis of the amplification products obtained and comparison of their sequence with 
the corresponding normal sequence of the SPG4 gene. 

30. Method for diagnosing AD-HSP associated with abnormal expression of a 
polypeptide encoded by the SPG4 gene, characterized in that one or more antibodies 
according to Claim 26 is (are) brought into contact with the biological material to be tested, 
under conditions which allow the possible formation of specific immunological complexes 
between said polypeptide and said antibody or antibodies, and in that the immunological 
complexes possibly formed are detected and/or quantified. 

31. Method for selecting a chemical or biochemical compound which is capable 
of interacting directly or indirectly with a polypeptide according to one of Claims 13 to 15, 
and 25, or with a nucleic acid according to one of Claims 1 to 6, and 12, and/or which 
makes it possible to modulate the expression or the activity of these polypeptides, 
characterized in that it comprises bringing a nucleic acid sequence according to one of 
Claims 1 to 6, and 12, a polypeptide according to one of Claims 13 to 15, and 25, a vector 
according to either of Claims 16 and 17, a cell according to Claim 18, a mammal 
according to either of Claims 19 and 20 or an antibody according to Claim 26 into contact 
with a candidate compound, and detecting a modification of the activity of said 
polypeptide. 
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32. Use of a nucleic acid sequence according to one of Claims 1 to 6, and 12, 
of a polypeptide according to one of Claims 13 to 15, and 25, of a vector according to 
either of Claims 16 and 17, of a cell according to Claim 18, of a mammal according to 
either of Claims 19 and 20 or of an antibody according to Claim 26, for studying the 
expression or the activity of the SPG4 gene. 
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CLAIMS 

1 . Purified or isolated nucleic acid of the SPG4 gene, characterized in that it 
comprises at least 15 consecutive nucleotides of a sequence chosen from the group 

5 comprising the sequence SEQ ID No. 1, the nucleic acid sequences which are 
homologues or variants of the nucleic acid of sequence SEQ ID No. 1, the sequence 
complementary thereto and the sequence of their corresponding RNA. 

2. Purified or isolated nucleic acid according to Claim 1 , characterized in that it 
comprises a sequence chosen from the group comprising the sequence SEQ ID No. 1, 

10 the sequence SEQ ID No. 2, the sequence SEQ ID No. 72, the nucleic acid sequences 
which are homologues or variants of the sequences SEQ ID No. 1, SEQ ID No. 2 or 
SEQ ID No. 72, the sequence complementary thereto and the sequence of their 
corresponding RNA. 

3. Purified or isolated nucleic acid according to Claim 1 or 2, characterized in 
15 that it comprises a mutation corresponding to a natural polymorphism in humans. 

4. Probe or primer, characterized in that it comprises a sequence of a nucleic 
acid according to one of Claims 1 to 3. 

5. Probe or primer according to Claim 4, characterized in that its sequence is 
chosen from the sequences SEQ ID No. 4 to SEQ ID No. 71 . 

20 6. Splice acceptor or donor site, characterized in that it comprises a sequence 

of a nucleic acid according to Claim 1 chosen from the sequences SEQ ID No. 74 to 
SEQ ID No. 105. 

7. Method for screening cDNA or genomic DNA libraries, or for cloning 
isolated genomic or cDNA encoding spastin, characterized in that it uses a nucleic acid 

25 sequence according to one of Claims 1 to 6. 

8. Method according to Claim 7, for identifying the genomic or cDNA 
sequence of the SPG4 gene of mammals, in particular of mice. 

9. Method for identifying a mutation carried by the human SPG4 gene, 
characterized in that it uses a nucleic acid sequence according to one of Claims 1 to 6. 

30 10. Method according to Claim 9, for identifying a mutation responsible for 

autosomal dominant hereditary spastic paraplegia. 

11. Method for identifying the nucleic acid sequences which promote and/or 
regulate the expression of the SPG4 gene, characterized in that it uses a nucleic acid 
sequence according to one of Claims 1 to 6. 
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1 AGGCCGKGA CC C1 CCOUS 1 CT 1 U^CAAGCAttgCTTCCIU^AC*^^ 60 

1 1 1 1 1 t|i i it 1 1 1 1 at i it « it in i in in 1 1 1 ii ft 1 1 1 in i in in i it i 

4*0 ACXXOlHGCGttrrCCGAGTCTTCC^ 1 T CU ACT AC ATCTCCAT TCCCCTGC 519 

€1 GCATCGACGAGGAAGAGAAACCAGGACaGAAGCAACAAGCTGTO^TC 120 

mmi inn 1 1 1 1 1 in 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ri 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 hi 1 1 

GCATCGATGAGGATCAGAAACCAGGACAGAA^ 579 
GTATCGAACAACTTCAAAAAGCAATCCCm 180 

Mil I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I It I f M II 1 1 1 1 1 || 1 1 II III If lilt II 

580 CTATTCAACAACTCGAAAAACCAATACCrCTTATACTTACACCACAACGTGA^ 639 
161 AAAGAGCTAGACG J L 1 1 I^AOCCAAAATGATGACTAATTTAGTTATCCCCAAGCACCGTT 24 O 

1 1 1 II Ml II 1 1 1 1 1 1 1 1 II I I Ml II I II 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 1 1 1 1 Ml I I 

640 AAAGAGCTAGACGlXrrTCAAGCTAAAATC*^ 699 
241 TACAACTTCTAGAGAACCTCCAACCACTTTTCCA^ 300 

II Mill 1 1 Mil 1 1 II II 1 1 II Mill II II 1 1 1 1 1 1 1 1 II Ml II II 1 1 II II II 

700 TACAACTTCTACACAAGATCrCAACCACTTTTCCCATTTTOCAACT 759 
301 ATAACCAGACTACTAACCTCACATGCCCCAATCWACAT^^ 360 

MM M 1 1 1 III 1 1 1 II 1 1 1 II f 1 1 1 1 1 1 1 1 1 f II f 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 

760 ATAATCACACTACTAACTTCCCATCCCGCAATCGACATCTCCACTCACAAACTXS 619 
361 TTCCGAAGACGAAAGACCCtrTTAACACATGCT 420 

MM II II lllillMllfllllll till III I III IMIMI llllllll I 

820 TTCGAAAAACAAAACACCCCTTAACACACACTACIAATTCACTCCCICGTTCAAAAACAC 879 
421 TOrrGAAAACTCCCTCCG<^GCCCTCTCCCGTCACC^ 480 

I If I II I Ml II Mill II II II lllll M II IMIMI llllllll 

a 80 T TATGA AA ACTGG ATCTCCAGCCCTTTCACCCCACCATAG^ 939 
481 TCTCCATG GJ I J OG ACCAAGAarGCGAOTGOlCCIX^ACCTACCACACATAACGgTA 540 

i ii ii in i it 1 1 1 1 1 1 i it 1 1 1 1 1 1 « 1 1 1 1 1 ii i in 1 1 1 1 1 in it 1 1 1 

940 ^ T ^ C * T G CT TTCT GC AGTCAAACAGGGATCtCG TCCT G CTCCT ACCACTCAT AAGCCTA 999 
541 CTCCAAAACCAAATAGAACCAACAAACCTTCTACTCCCACAACTCCA 

MM Ml Itlllll II I III I Ml 1 1 1 1 1 1 1 I i Mill 1 5 Mill 1 1 I: 

1000 CTCCCAAAACAAATAGGACAAATAA ACCTTCT ACCCCTACAACTCCT ACTC^ 1059 
601 AAGACTTGAAAAATTTTAGGAATGTGGACACCAATCTTGCTAACCTTATA^ 660 

II III Mill 1 1 1 1 1 1 II I If 1 1 1 II 1 1 1 1 1 II 1 1 1 II 1 1 III 1 1 f| 1 1 1 1 1 III 1 1 1 

1060 AAGACTTGAAGAATTTT ACGAATGTCG ACAGCAACCTTCCTAACCTTAT AATCAATG AAA 1 11 9 
661 TTCTTGACAATGCCAC^CCTGTtAAGTrTGATCACATAGCCGGGCA 720 

I J 1 1 I III III! I fl 1 lilt III It! III II lllll II MM II till I I 

1120 TTCTGGACAATCCAAlCACCTCTTAAATTTCATCATa™ 1179 
721 AAGCGCTGCAGGAGATTGTCATCCTTCCTTtTCTCCKKS^^ 780 

MM MM H Mill II It II III II II I III M I MM II M I Ml 1 1 M I 

1180 AAGCATTCCAACAAATTGTTATTCTTCCtTCTCTCAGTC 1239 
; 781 GAGCTCCTSTTAGAGGCTTCTTACTCTTCCGTCCGCCAGGAAACGG 84 0 

MIMMIII Mill IMIMI III 1 1 II III II M II M III I II MM 

1240 CAGCTCCTCCCAGAGGGCTGTTACrCT^^ 1299 
641 CTAAAGCACTAGCTGCAGAGTCTAATGCGACCTTTTTCAAC^^^ 900 

I 1 1 f M III I llllllll I I III 1 II Mill MM I f 1 1 1 1 Ii 1 1 1 Ml t II I 

1300 CT AAAGCACT AGCTGC AG AATCG AATGC A ACCTTCTT TAAT AT AAGTGCTGC AAGTTT AA .13 59 
901 CTTCAAAATATGTGGGAG AAGGAGACAAATTCCTCACACCTCTCTTTCCTGTCGCTCCAG 960 

llllllll III Mill III I III II M llllllll I lllll HIM III I ft I lilt 

1360 CTTCAAAATACGTGGGACAAGGAGAGAAATTGGTCACCCCTCTTTTTCCTCTGGCTCCAC 1419 
961 AACTtcAACGATCTAT AATTTTTAT AC ATGAACTTG ACA C l CI I 1.1 G I GTGAGAGACCGC 1020 

f 1 1 1 1 1 III llllllll II III II II I H III 1 1 1 1 II I It M 111 M I Mil I 

1 #20 AACT?TCAACCTTCTATAATTTTTATAGATGAAGTTCATAGC C III! I U. TCAAAGAAGAG 1479 
102] AAGG^GACCACGACCCTAG^AGACXicCTAAAGACGGAATTTTTAATAGAATTTGACGGGG 108 0 

in i it tu ii i i inn iiiii inn if Mint imi immimi 11 i 

1480 A AGGCGAGCACGAT GCT ACT AG ACGCCT A AAA ACTGAATTT CTAAT AGAATTTG ATGGTG 1539 
1 08 1. TGCAATCTCCTGGACATCACAGACT ACTTCTAATGGCTCCAACT AACAGGCCCCAAGACC 1 1 4 0 

: i iiunii i m i mum 1 1 in 1 1 1 1 In iii iii 1 1 m i if miii nit mi 

154 0 TACAGTCTGCTGGAGATGACAGAGTACTTCTAATGGCTGCAACT A AT AGGCCACAAG AGC 1599 
114 1 TTCATGAAGCTCTTCTCAiOKXrTtTCATT 1200 

Ml MM; II Mill II I II Mil I II I Mill Ml Ml HUM Mil Mil Mill 

1 600 TTGATGAGCCTXSTTCTCAGGCX#TTTCATCAAACGCCTATATCTC 1 6 S 9 

1201 AGACAAGACTCCTTCTGCTTAAAAACirTGTTGTCTAAACJUlGG^ 1260 

MIMMIII Ml (III M 1 1 M HIM 111111111111111111 Mill Ml I 

AGACAAGACTACTTTTCCTTAAAAATCTCTTAt^ 1719 
A ACAACTCGCACAGCTTGCT AGAATCACCCATGCATACTCTGGAAGTGATCTGACCGCTT 1320 

I II I II I I II M III 1 1 1 II 1 1 1 ft I 111 111 I II II llllllll M II MM 

AAGAACTACCACAAC^TGCTAGAATGACTGATGGATACTCAGGAAGTGACCTAACAGCTT • 1779 
TGGCCAAGGATGCaGCCCTGGCTCCTATCCGAGAACTGAACOCAGAGCAGGTGAACAATA 1380 

IMI M MM II II llll II) II II 1 1 II III 1 1 II Mill II M II MM M I 

* • TGGCAAAACATGCAGGACTGGGTCCTATCCGAGAACTaAAACCAGAACACGTGAAGAATA 1839 
1381 TCTCTGCCAGTGAGATGAGAAATATTCCATTATCTCACTTCACAGAATCCTTAAAAAAGA 14 40 

M M 1 1 M M M I M M 1 1 1 M II I M 1 1 1 1 M M 1 1 1 M Mi Ml II II I MM I I 

1840 TCTCTCCGAGTGACATCAGAAATATTCCAm 

14 41 TAAAACCCAGTGTGACTCCTCAGACCTTAGAAGCATACATACG^GGAACAAGGATTTTG 1500 7 

Mi II It II I 1 1 1 1 I MMI IM 1 1 M 1 1 III! 1 1 II III llllllll: I 

1900 TAA AACGCACCGTCACCCCTCA AACT TT ACAAGCGT ACATACGTT GGAACA AGGACTTTG 1959 
ISO J CAGACACCACTCTTTAAAGGAAT 1S23 

I it i ittiitii ii ii i lit 

I960 CACATACCACTCTTTAAGCAAAT 1982 
I - - - J 3263 ' . 

GGATGCCTCTGTGAGCCCATAGAACATGGGACTTCACAGCaAACAAGACCTTTGGCTACA 15 83 
CGA ACCCAGACTTCGTTT AC AGOACGTT^T AG ACTTTTC ATTTTTCTGC ACCAA ACTTGA 164 3 
AGAGGAACAAGAAGACACACCTA >ATAAA AfATGCAATATGAATCG 16B9 
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